CS 240 ASCII FAQ

If you had to assign an 8-bit number to every key on your keyboard, how would you do it?

Computers manipulate all information using bits. There is not really a designation of letters or pictures or sounds inside of a computer; just bits. We interpret them in a manner that best fits the circumstance. If we want to represent letters, we simply choose a numeric representation for each letter and deal with it.

Many years ago, when people were still programming computers with toggle switches, there were many ways of assigning numbers to letters since everyone came up with their own encoding. For instance, you could assign the value 1 to 'A', 2 to 'B' and so on. Since computers were not generally connected to each other in those days, it didn't really matter which encoding you chose. Then a few things happened that made it necessary to really think about the encoding.

First, it was necessary to order the encoding so that characters could be sorted sanely. i.e. if you were sorting the strings "Aardvark" and "Zebra", you'd want "Aardvark" to have a lower value so that it would be sorted properly. This might seem pretty obvious but what about cases like "Zoo" and "mailbox"? Which should come first, uppercase "Z" or lowercase "m"? How about digits in strings like "3com" and "Motel6"? What about spaces? What about punctuation characters?

Second, when a peripheral was connected to a computer, you'd expect it to work without a lot of psychological debugging. If you used an encoding of 1 for 'A', 2 for 'B', and so on, you'd want a printer connected to your computer to print an 'A' when you sent it a 1. What if every computer were different? You'd have to re-wire your printer every time you hooked it up to a new computer.

Finally, as people started connecting computers to each other or sharing data with other computers via magnetic media, it became really important to have the same encodings for characters.

ASCII invented

In 1968, the American National Standards Institute (ANSI) came out with their recommendation X3.4 that defined an ordered assignment of characters to numbers. This was also recognized internationally as ISO 646. Among other oddities, it specified that certain punctuation characters had a lower numeric value than digits, digits had a lower numeric value than letters and uppercase letters had a lower numeric value than lowercase letters. The encoding looks like this:

code	letter	code	letter
0	NUL	64	@
1	SOH	65	A
2	STX	66	B
3	ETX	67	C
4	EOT	68	D
5	ENQ	69	E
6	ACK	70	F
7	BEL '\a'	71	G
8	BS '\b'	72	H
9	HT '\t'	73	I
10	LF '\n'	74	J
11	VT '\v'	75	K
12	FF '\f'	76	L
13	CR '\r'	77	M
14	SO	78	N
15	SI	79	O
16	DLE	80	P
17	DC1	81	Q
18	DC2	82	R
19	DC3	83	S
20	DC4	84	T
21	NAK	85	U
22	SYN	86	V
23	ETB	87	W
24	CAN	88	X
25	EM	89	Y
26	SUB	90	Z
27	ESC	91	[
28	FS	92	\ '\\'
29	GS	93	]
30	RS	94	^
31	US	95	_
32	SPACE	96	`
33	!	97	a
34	"	98	b
35	#	99	c
36	$	100	d
37	%	101	e
38	&	102	f
39	'	103	g
40	(	104	h
41	)	105	i
42	*	106	j
43	+	107	k
44	,	108	l
45	-	109	m
46	.	110	n
47	/	111	o
48	0	112	p
49	1	113	q
50	2	114	r
51	3	115	s
52	4	116	t
53	5	117	u
54	6	118	v
55	7	119	w
56	8	120	x
57	9	121	y
58	:	122	z
59	;	123	{
60	<	124	\|
61	=	125	}
62	>	126	~
63	?	127	DEL

Note in the table above that some of the first few encodings have special names like "BEL" and "DEL". Those characters are unprintable. For instance, BEL is the "bell" character that causes the terminal to beep. DEL is the delete key.

To probe further

Here's an interesting web page that has more information on this subject.