The Bit

In the study of information theory, it's often helpful to reduce complex representations of information to its lowest common denominator. That would be the bit--the smallest amount of information that can be represented. A bit is represented by a single value that is either one or zero. This can be used to represent many things:

    one     or  zero
    on      or  off
    yes     or  no
    true    or  false
    present or  absent
    success or  failure
    up      or  down
    left    or  right
    onions  or  hold-the-onions

Obviously, a particular bit of information only has meaning in the context we put it in. If we're talking about a bit of information, we have to agree on what it represents in advance.

Bits can be combined into represent more complicated systems. For instance, if we wanted to represent a four-value direction system, we might do it like this:

    00:   up
    01:   down
    10:   left
    11:   right

Here, we're saying that the up condition is present only if the two bits are both zero. The left condition is present only if the first bit is one and the second bit is zero. We had to agree on what these bits meant in advance. If I hadn't told you what '01' was, you wouldn't have known that it meant 'down'.

As we use more bits, we can represent more complicated information. Specifically, if we have N bits, we can always represent 2N distinct values. If we have one bit, we can represent two values (e.g. on and off). In the example above, we have two bits so we can represent 22=4 values. There are some groupings of bits that are very common that are worth looking at:

The Nybble

A nybble is defined as a group of four bits. Using one nybble, we can represent 24=16 values. For instance, if we wanted to represent an arithmetic quantity, we might agree on the meaning of the bits in the following manner:

    Bits:   Value:
    =====   ======
    0000:       0
    0001:       1
    0010:       2
    0011:       3
    0100:       4
    0101:       5
    0110:       6
    0111:       7
    1000:       8
    1001:       9
    1010:      10
    1011:      11
    1100:      12
    1101:      13
    1110:      14
    1111:      15

Notice that there's some mathematical simplification possible here. If we let the rightmost bit represent 1, the next bit represent 2, the next bit represent 4 and the leftmost bit represent 8, we see that the value represented by all of the bits is simply the sum of all of the values of the individual bits. For instance, 1101 would mean 8+4+1 = 13.

We can generalize this by saying that the arithmetic value of a group of bits is always the sum of the values of the bits and that the value of each bit is represented by 2N where N is the N-th bit of the group. The rightmost bit is given the number zero. For the value 13 above, it's represented by the values:

    1*23 + 1*22 + 0*21 + 1*20
  = 1*8  + 1*4  + 0*2  + 1*1
  = 13

Another common use of a nybble is to show a 4-bit value using one digit in the hexadecimal number system. In this system, there are 16 distinct numerals that go from 0 to 9 and then 'A' to 'F' like this:

    Binary  Decimal  Hexadecimal
    Value:  Value:      Value:
    ======  =======  ===========
    0000:       0          0
    0001:       1          1
    0010:       2          2
    0011:       3          3
    0100:       4          4
    0101:       5          5
    0110:       6          6
    0111:       7          7
    1000:       8          8
    1001:       9          9
    1010:      10          A
    1011:      11          B
    1100:      12          C
    1101:      13          D
    1110:      14          E
    1111:      15          F

This number system would be easy to understand if we all had 16 fingers on our hands. Mathematically, hexadecimal is really just the base-16 numbering system. We'll often represent this using parentheses around a number followed by a subscript 16. For instance, the decimal value (12)10 is equivalent to the hexadecimal value (C)16. Just keep in mind that this system is useful for representing a 4-bit group with only one numeral. This will be more important as we look at groups of bits that consist of multiple groups of 4-bits.

Bytes

The near-universal grouping of bits is the byte--a group of 8 bits. We can use a byte to represent 28=256 values. Often, this is an arithmetic value ranging from 0 up to 255.

A byte is also the traditional mode of representing a single ASCII character. This ASCII reprensentation is something that most everyone in the world has agreed on.

A byte can always be expressed as two nybbles using the hexadecimal number representation.

Let's look at the three different meanings of a single byte 01101010:

    (01101010)2 = 26 + 25 + 23 + 21
                = (64)10 + (32)10 + (8)10 + (2)10
                = (106)10

    (01101010)2 = 0110 1010 = (6A)16

    ...and if you'd care to look it up, you'd find that the decimal value
    106 corresponds to the ASCII character 'j'.

Notice that it's a LOT easier to convert from binary to hexadecimal than it is to convert to decimal.

Larger Values: Integers

What if we want to represent even larger arithmetic values?

We can simply extend the number of bits that we use to be able to represent larger and larger quantities! There are some generally accepted groupings of 16-, 32- and 64-bits that can represent 216=65536, 232=4294967296 and 264=18446744073709551616 values respectively.

You usually only want to use as many bits as you absolutely need so we should put these numbers in context. 232 is roughly the human population of Earth (actually a little less). 264 is roughly the number of protons and neutrons in 3mg of gold. (Doesn't seem so big any more, does it?)

What about negative numbers?

We can represent negative and positive values by bits fairly easily. As always, we simply need to agree on the representation. You might think an easy way to do this would be to just use one of the bits to indicate whether the integer is positive or negative. And that would be a good first idea. Early computer designers did exactly that for representing signed integers.

Not too much time went by before the computer designers realized that it was easier for the computer's arithmetic logic to represent signed integers using two's complement notation. And there are a lot of ways to try to explain how and why this is done. Let's just say you'll definitely learn about it in a digital logic design class. For now, we'll just say that two's complement notation simply means that the highest order bit has negative value. For instance, if we let a byte represent an integer in two's complement notation, it would look like this:

-27 26 25 24 23 22 21 20
(-128)10 (64)10 (32)10 (16)10 (8)10 (4)10 (2)10 (1)10

So the binary value (10101001)2 = (A9)16 would have a signed decimal value of -128 + 32 + 8 + 1 = (-87)10. If we would have interpreted it as an unsigned value, it would have been 128 + 32 + 8 + 1 = (169)10. As always, we must agree on what the bits are supposed to represent before we try to interpret them.

Incidentally, the maximum allowable positive value for a signed representation is always half of what could be used in an unsigned representation. That's because the most significant bit is now the negative placeholder.

What does this have to do with C?

C has several built-in types that can be used to represent arithmetic values. Their (typical) sizes are shown below:

    char       1 byte    8 bits
    short      2 bytes   16 bits
    int        4 bytes   32 bits
    long       4 bytes   32 bits
    long long  8 bytes   64 bits

They can be either signed or unsigned depending on what we're trying to represent. For instance, if you defined a variable like:

    signed int X;

you could assign a value ranging from -2147483648 up to 2147483647. Similarly, a signed char could hold a value ranging from -256 to 255 and a signed short could hold a value ranging from -32768 to 32767.

We can experiment with the bits in an integer by using C's hexadecimal representation specifier like this:

    void try_this()
    {
      signed short X = 0xABCD;

      printf("The value of hexadecimal ABCD is %d.\n", X);
      printf("The hexadecimal value can be printed as: 0x%04X\n", X);
      return;
    }

One more pitfall...

Although everyone in the world agrees on the order in which bits go into bytes (e.g. the rightmost bit is number 0), not everyone agrees on the order of bytes in integers. In particular, the two types of machines we use do this as the reverse of each other. For instance, if we assigned a hexadecimal value to an integer:

   unsigned int X = 0x1234ABCD;

and wrote it to a file, the bytes in the file would be (12)16, (34)16, (AB)16, (CD)16 if we wrote them using a Sun (or Macintosh or HP) system but would be (CD)16, (AB)16, (34)16, (12)16 if we used an Intel, AMD or Alpha system. This order of bytes in an integer is called the endian-ness of an architecture. We'll have to keep this in mind whenever we directly deal with the bits inside of a multi-byte integer. Other than that, we don't particularly have to pay close attention to it.