MIME

From Wikipedia, the free encyclopedia.

Jump to: navigation, search

Multipurpose Internet Mail Extensions (MIME) is an Internet Standard for the format of e-mail. Virtually all Internet e-mail is transmitted via SMTP in MIME format. Internet e-mail is so closely associated with the SMTP and MIME standards that it is sometimes called SMTP/MIME e-mail.

Contents

Introduction

The basic Internet e-mail transmission protocol, SMTP, supports only 7-bit ASCII characters (see also 8BITMIME). This effectively limits Internet e-mail to messages which, when transmitted, include only the characters used for the English language. MIME defines mechanisms for sending other kinds of information in e-mail, including text in languages other than English using character encodings other than ASCII as well as 8-bit binary content such as files containing images, sounds, movies, and computer programs. MIME is also a fundamental component of communication protocols such as HTTP, which requires that data be transmitted in the context of e-mail-like messages, even though the data may not actually be e-mail.

Mapping messages into and out of MIME format is typically done automatically by an email client or by mail servers when sending or receiving Internet (SMTP/MIME) e-mail.

The basic format of Internet e-mail is defined in RFC 2822, which is an updated version of RFC 822. These standards specify the familiar formats for text e-mail headers and body and rules pertaining to commonly used header fields such as "To:", "Subject:", "From:", and "Date:". MIME defines a collection of e-mail headers for specifying additional attributes of a message including content type, and defines a set of transfer encodings which can be used to represent 8-bit binary data using characters from the 7-bit ASCII character set. MIME also specifies rules for encoding non-ASCII characters in e-mail message headers, such as "Subject:", allowing these header fields to contain non-English characters.

MIME is extensible. Its definition includes a method to register new content types and other MIME attribute values.

One of the explicit goals of the MIME definition was to not require changes to pre-existing e-mail servers, and to allow plain text e-mail to function in both directions with pre-existing clients. This goal is achieved by allowing all MIME message attributes to be optional, with default values making a non-MIME message likely to be interpreted correctly by a MIME-capable client. In addition, a simple MIME text message is likely to be interpreted correctly by a non-MIME client although it has e-mail headers the non-MIME client won't know how to interpret.

MIME headers

MIME-Version

The presence of this header indicates the message is MIME-formatted. The value is typically "1.0" so this header appears as

  MIME-Version: 1.0

Content-Type

This header indicates the type and subtype of the message content, for example

  Content-type: text/plain

The combination of type and subtype is generally called a MIME type, although in modern applications, Internet media type is the favored term, indicating its applicability outside of MIME messages. A large number of file formats have registered MIME types. Any text type has an additional charset parameter that can be included to indicate the character encoding. A very large number of character encodings have registered MIME charset names.

Although originally defined for MIME e-mail, the content-type header and MIME type registry is reused in other Internet protocols such as HTTP.

Through the use of the multipart type, MIME allows messages to have parts arranged in a tree structure where the leaf nodes are any non-multipart content type and the non-leaf nodes are any of a variety of multipart types. This mechanism supports:

  • simple text messages using text/plain (the default value for "Content-type:")
  • text plus attachments (multipart/mixed with a text/plain part and other non-text parts). A MIME message including an attached file generally indicates the file's original name with the "Content-disposition:" header, so the type of file is indicated both by the MIME content-type and the (usually OS-specific) filename extension.
  • reply with original attached (multipart/mixed with a text/plain part and the original message as a message/rfc822 part)
  • alternative content, such as a message sent in both plain text and another format such as HTML (multipart/alternative with the same content in text/plain and text/html forms)
  • many other message constructs

Content-Transfer-Encoding

MIME (RFC 2045) defines a set of methods for representing binary data in ASCII text format. The content-transfer-encoding: MIME header indicates the method that has been used. The RFC and the IANA's list of transfer encodings define the following values, which are not case sensitive:

  • Suitable for use with normal SMTP:
    • 7bit - up to 998 octets per line of the code range [1..127]\{CR, LF}. This is the default value.
    • quoted-printable - used for text data consisting primarily of US-ASCII characters but also containing byte values outside that range.
    • base64 - used for arbitrary binary data
  • Suitable for use with SMTP servers that support the 8BITMIME transport SMTP extension:
    • 8bit - up to 998 octets per line ending with CR+LF; octets must be in the code range [1..255]\{CR, LF}.
  • Not suitable for use with SMTP:
    • binary - any sequence of octets. Not usable with SMTP mail.

There is no encoding defined which is explicitly designed for sending arbitary binary data through 8BITMIME transports, thus base64 or quoted-printable must sometimes still be used.

Encoded-Word

Since RFC2822 message header names and values are always ASCII characters, values that contain non-ASCII data must use the MIME encoded-word syntax (RFC 2047) instead of a literal string. This syntax uses a string of ASCII characters indicating both the original character encoding (the "charset") and the content-transfer-encoding used to map the bytes of the charset into ASCII characters.

The form is: "=?charset?encoding?encoded text?=".

  • charset is often utf-8, but may be any character set registered with IANA. iso-2022-jp is common in Japan. iso-8859-1 and more recently iso-8859-15 are common in Europe.
  • encoding can be either "Q" denoting quoted-printable encoding, or "B" denoting base64 encoding.
  • encoded text is the quoted-printable or base64-encoded text.

For example,

Subject: =?utf-8?Q?=C2=A1Hola,=20se=C3=B1or!?=

is interpreted as "Subject: ¡Hola, señor!".

The encoded-word format is not used for header names, such as Subject:. These header names are always in English in the raw message. When viewing a message with a non-English e-mail client, the header names are translated by the client.

Multipart Example

A MIME multipart message contains a boundary in the "Content-type:" header; this boundary, which must not occur in any of the parts, is placed between the parts, and at the beginning and end of the body of the message, as follows:

Content-type: multipart/mixed; boundary="frontier"
MIME-version: 1.0

--frontier
Content-type: text/plain

This is the body of the message.
--frontier
Content-type: application/octet-stream
Content-transfer-encoding: base64
  
gajwO4+n2Fy4FV3V7zD9awd7uG8/TITP/vIocxXnnf/5mjgQjcipBUL1b3uyLwAVtBLOP4nV
LdIAhSzlZnyLAF8na0n7g6OSeej7aqIl3NIXCfxDsPsY6NQjSvV77j4hWEjlF/aglS6ghfju
FgRr+OX8QZMI1OmR4rUJUS7xgoknalqj3HJvaOpeb3CFlNI9VGZYz6H6zuQBOWZzNB8glwpC
--frontier--

See also

References

External links

Personal tools