Chapter 1 Introduction

Key concepts:

  • representing programs as data
  • program equivalence and program evaluation
  • converting an interpreter into a compiler by recording each operation it performs
  • expressing things using more basic things

1.1 Compilers and Interpreters

What is a compiler? You have most certainly used one.

A compiler translates programs from one language to another. Usually from one that is good for humans to one that is good for machines. Hopefully, making efficient use of hardware resources (optimize: latency, throughput, energy).

An interpreter executes programs in a given language.

The difference is not 100% obvious. Compilers can be chained. At the end, there is always an interpreter, possibly implemented as circuit gates in hardware. In modern hardware platforms, even the machine language is an abstraction, and CPUs contain sophisticated just-in-time compilers.

Key: an interpreter or compiler is just another program, that treats other programs as data.

1.2 Programs as Data

The same program can be represented in different ways. Source code: unstructured list of characters. Machine code: list of machine instructions. Internally: trees or graphs.

A compiler needs to switch between these representations, and pick internal formats for various tasks that precisely capture the necessary information, without additional cruft.

Internal languages are mostly based on trees or graphs.

How to choose? Depending on purpose. Typical for a compiler: pick “the most natural” as a starting point.

  • Unstructured list of characters: source code
    "1+2*3"
  • Trees or graphs: internal representation
    Plus(1, Times(2, 3))
  • List of machine instructions
    movq  $2, %rax
    imulq $3, %rax
    addq  $1, %rax

What makes data a program is a function that assigns meaning – an interpreter.

1.3 Program Evaluation and Program Equivalence

Two fundamental notions. Analogy to number representations: there is only one number three, but it can be written as “three”, 3, III, 11. These representations are useful for different purposes. Likewise, the same computer program can be represented in different ways. Translating from one representation into another is what compilers do.

Program evaluation assigns meaning to a given representation (=language). It defines what is the result of executing a program. We write eval_A[p] to denote the result of evaluation program p, given in language A.

Program equivalence: programs p in language A and q in language B are equivalent if executing them produces equivalent outcomes (E_A[p] == E_B[q]).

Program equivalence is the key property a compiler needs to preserve. Source program and target program should do the same thing.

Note that equivalence of outcomes is a somewhat fuzzy notion. In general this is an abstraction of the exact physical behavior. If we want to be totally rigorous, we can define precisely what potential outcomes are and what equivalence means. In particular: side effects like input and output, interacting with the environment.

Note/Example: In general we want compilers to produce code that runs faster. Hence, the timing behavior will be different. But for example in security-critical code timing information might be a side channel that can be exploited for attacks. So in this case, we might want a compiler to preserve timing information. If we want to make this precise, we need to provide evaluation models of the source and target languages that include timing information in their outcomes and then prove formally or at least experimentally that the compiler preserves this information.

Another practical question is whether a compiler is allowed to optimize away an infinite loop. This would change a program that runs forever into one that stops.

1.4 History of Compilers

Early computers executed fixed computations, or computations had to be configured externally. The idea of stored-program computers was central to enable treating code as data (Zuse, von Neumann, Turing). First assembler, first compilers. Who compiled the first compiler? Bootstrap idea.