Uuencoding

From Wikipedia, the free encyclopedia

  (Redirected from Uuencoded)
Jump to: navigation, search

Uuencoding is a form of binary-to-text encoding that originated in the Unix program uuencode, for encoding binary data for transmission over the uucp mail system. The name "uuencoding" is derived from "Unix-to-Unix encoding". Since uucp converted characters between various computers' character sets, uuencode was used to convert the data to fairly common characters that were unlikely to be "translated" and thereby destroy the file. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary files by e-mail and posting to usenet newsgroups, etc. It has now been largely replaced by MIME and yEnc. With MIME, files that might have been uuencoded are transferred with base64 encoding.

Contents

[edit] Encoded format

A file in uuencoded format starts with a header line of the form:

begin <mode> <file>

Where <mode> is the file's Unix read/write/execute permissions as three octal digits, and <file> is the name to be used when recreating the binary data. The file ends with two trailer lines:

`
end

(The grave accent indicates a line that encodes zero bytes; see below.)

Lines between the header and trailer encode data.

Each data line starts with a character indicating the number of data bytes encoded on that line and ends with a newline character. All data lines, except perhaps the last, encode 45 bytes of data. The corresponding encoded length value is 'M' (see below), so most lines begin with 'M'.

A data line subsequently contains group of four characters that encode three bytes of data. If the number of data bytes for a line is not divisible by three, one or two additional zero bytes are appended to the input data before encoding; the encoding always has groups of four characters. Those padding bytes are not included in the count at the beginning of the last line.

A data line's byte count is encoded by adding 32 and using the corresponding ASCII character, except that a byte count of zero is encoded as grave accent ("`", code 96).

(In ASCII the first thirty-two characters are unprintable and controlled data transmission. They could be modified or deleted by transmission. The next ninety-five characters at code 32 and above are all printable. Since the byte count is in the range 0-45, adding 32 converts it into a printable character. The ASCII code for 'M' is 77, or exactly 45 + 32. For a zero-length line, adding 32 to 0 gives 32, corresponding to a space character. This character was also problematic for data transmission, so the grave accent ("`", code 96) is used instead. Subtracting 32 produces a value whose lower six bits are 0.)

Each group of three bytes is encoded into four characters. The bytes are concatenated into a 24-bit value in big-endian order. (The first byte become the most significant 8 bits of the value.) The 24-bit value is then split into four groups of six bits each, also in big-ending order. (The most significant six bits becomes the first group.) Each group of six bits is then encoded into a character using the same calculation as for byte counts. (Since the range of values is from 0 to 63, when 32 is added the ASCII characters will lie in the range 32 (space) to 32 + 63 = 95 (underscore).) ASCII characters greater than 95 may also be used; however, only the six right-most bits are relevant.

Sometimes each data line has extra dummy characters (often the grave accent (ASCII 96)) added to avoid problems with mailers that strip trailing spaces. These characters are ignored by uudecode. The grave accent can also be used in place of a space character.

As a complete file, the uuencoded output for (the ASCII bytes representing the string) Cat would be

begin 644 cat.txt
#0V%T
`
end

The begin line is a standard uuencode header; the '#' indicates that its line encodes three characters; the last two lines appear at the end of all uuencoded files.

[edit] Sample uuencoding

The encoding process is demonstrated by this table, which shows the derivation of the above encoding for "Cat".

Original characters C a t
Original ASCII, decimal 67 97 116
ASCII, binary 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0
New decimal values 16 54 5 52
+32 48 86 37 84
Uuencoded characters 0 V % T

[edit] Uuencode table

The following table represents the subset of ASCII characters used by UUEncode and the 6-bit binary string they represent (in octal).

six
bits
code
char
six
bits
code
char
six
bits
code
char
six
bits
code
char
00 SP   20 0   40 @   60 P
01 !   21 1   41 A   61 Q
02 "   22 2   42 B   62 R
03 #   23 3   43 C   63 S
04 $   24 4   44 D   64 T
05 %   25 5   45 E   65 U
06 &   26 6   46 F   66 V
07 '   27 7   47 G   67 W
10 (   30 8   50 H   70 X
11 )   31 9   51 I   71 Y
12 *   32 :   52 J   72 Z
13 +   33 ;   53 K   73 [
14 ,   34 <   54 L   74 \
15 -   35 =   55 M   75 ]
16 .   36 >   56 N   76 ^
17 /   37 ?   57 O   77 _
                  00 `

[edit] POSIX Base64 coding

Despite its limited range of characters, uuencoded data is sometimes mangled on passage through certain old computers. The worst offenders are computers using non-ASCII character sets such as EBCDIC. One attempt to fix the problem was the Xxencode format, which used only alphanumeric characters and the plus and minus symbols. More common today is the Base64 format; it can also be generated by the uuencode program. The header is changed to

begin-base64 <mode> <file>

the trailer becomes

====

and lines between are encoded with characters chosen from

ABCDEFGHIJKLMNOP
QRSTUVWXYZabcdef
ghijklmnopqrstuv
wxyz0123456789+/

[edit] Trivia

Microsoft's E-mail-program Outlook Express once erroneously accepted "begin  <filename>" as the start of UUEncoded attachments (i.e., not requiring octal encoded UNIX-style permissions). Especially in Usenet, where MIME is seldom used[citation needed] and plain text is preferred, some people would embed "begin" followed by two spaces in their messages in order to hide the rest of the message from Outlook Express users (e.g., they configured their news-client to quote starting with the line "begin  quote from xxx"). [1]

[edit] See also

[edit] References

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

[edit] External links

  • GNU sharutils - The Free Software Foundation's sharutils bundle includes uuencode, uudecode, and others.
  • UUDeview - open-source program to encode/decode Base64, BinHex, uuencode, xxencode, etc. for Unix/Windows/DOS
  • UUENCODE-UUDECODE - open-source program to encode/decode created by Clem "Grandad" Dye
  • StUU - Open Source fast UUDecoder for Macintosh by Stuart Cheshire
Personal tools