CS 240 ASCII FAQ

If you had to assign an 8-bit number to every key on your keyboard, how would you do it?

Computers manipulate all information using bits. There is not really a designation of letters or pictures or sounds inside of a computer; just bits. We interpret them in a manner that best fits the circumstance. If we want to represent letters, we simply choose a numeric representation for each letter and deal with it.

Many years ago, when people were still programming computers with toggle switches, there were many ways of assigning numbers to letters since everyone came up with their own encoding. For instance, you could assign the value 1 to 'A', 2 to 'B' and so on. Since computers were not generally connected to each other in those days, it didn't really matter which encoding you chose. Then a few things happened that made it necessary to really think about the encoding.

First, it was necessary to order the encoding so that characters could be sorted sanely. i.e. if you were sorting the strings "Aardvark" and "Zebra", you'd want "Aardvark" to have a lower value so that it would be sorted properly. This might seem pretty obvious but what about cases like "Zoo" and "mailbox"? Which should come first, uppercase "Z" or lowercase "m"? How about digits in strings like "3com" and "Motel6"? What about spaces? What about punctuation characters?

Second, when a peripheral was connected to a computer, you'd expect it to work without a lot of psychological debugging. If you used an encoding of 1 for 'A', 2 for 'B', and so on, you'd want a printer connected to your computer to print an 'A' when you sent it a 1. What if every computer were different? You'd have to re-wire your printer every time you hooked it up to a new computer.

Finally, as people started connecting computers to each other or sharing data with other computers via magnetic media, it became really important to have the same encodings for characters.

ASCII invented

In 1968, the American National Standards Institute (ANSI) came out with their recommendation X3.4 that defined an ordered assignment of characters to numbers. This was also recognized internationally as ISO 646. Among other oddities, it specified that certain punctuation characters had a lower numeric value than digits, digits had a lower numeric value than letters and uppercase letters had a lower numeric value than lowercase letters. The encoding looks like this:

codelettercodeletter
0 NUL 64 @
1 SOH 65 A
2 STX 66 B
3 ETX 67 C
4 EOT 68 D
5 ENQ 69 E
6 ACK 70 F
7 BEL '\a' 71 G
8 BS '\b' 72 H
9 HT '\t' 73 I
10 LF '\n' 74 J
11 VT '\v' 75 K
12 FF '\f' 76 L
13 CR '\r' 77 M
14 SO 78 N
15 SI 79 O
16 DLE 80 P
17 DC1 81 Q
18 DC2 82 R
19 DC3 83 S
20 DC4 84 T
21 NAK 85 U
22 SYN 86 V
23 ETB 87 W
24 CAN 88 X
25 EM 89 Y
26 SUB 90 Z
27 ESC 91 [
28 FS 92 \ '\\'
29 GS 93 ]
30 RS 94 ^
31 US 95 _
32 SPACE 96 `
33 ! 97 a
34 " 98 b
35 # 99 c
36 $ 100 d
37 % 101 e
38 & 102 f
39 ' 103 g
40 ( 104 h
41 ) 105 i
42 * 106 j
43 + 107 k
44 , 108 l
45 - 109 m
46 . 110 n
47 / 111 o
48 0 112 p
49 1 113 q
50 2 114 r
51 3 115 s
52 4 116 t
53 5 117 u
54 6 118 v
55 7 119 w
56 8 120 x
57 9 121 y
58 : 122 z
59 ; 123 {
60 < 124 |
61 = 125 }
62 > 126 ~
63 ? 127 DEL

Note in the table above that some of the first few encodings have special names like "BEL" and "DEL". Those characters are unprintable. For instance, BEL is the "bell" character that causes the terminal to beep. DEL is the delete key.

To probe further

Here's an interesting web page that has more information on this subject.