Chapter 6 Error Handling

Error handling and other UX improvements

  • parser
    • find more errors (e.g. literals: number out of bounds)
    • source pos info in tokenizer
      • better error messages
      • semicolon inference
      • comments
    • error recovery
  • semantic analysis
    • check for undefined names
    • type mismatch (later!)

A key aspect of the user experience of a compiler is dealing with errors.

Sometimes the compiler just can’t translate a program (parsing fails, codegen fails due to unbound variable). In such cases we want the compiler to report more information that will help programmers to fix the error.

The purpose of a compiler is also to check for (certain classes of) errors in source program, so that those don’t manifest at runtime.

6.1 Improving the Parser

We have been focusing on how to accept valid programs. What can be improved when we need to reject invalid programs?

  • Report more information. Such as: line number, give some hints
  • Try to recover and continue parsing.
  • Find more errors, e.g., overflow of number literals

6.1.1 Integer Literals Out of Bounds

For multi-digit numbers

6.1.2 Source Position Information

Extend token interface:

case class Position(line: Int, col: Int)
case class SourceInfo(file: String, gap: Position, start: Position, end: Position)

abstract class Token {
  var pos: SourceInfo = _
}

Extend scanner logic:

class Scanner(in: Reader[Char]) extends Reader[Token] {
  def pos = in.pos // obtain from character input
  var line = 0
  var lineStarts = scala.collection.mutable.ArrayBuffer(0)
  def column = pos - lineStarts(line)
  ...
  def skipWhiteSpace() = {
    while (in.hasNext(isWhiteSpace)) { // skip white space
      if (in.peek == '\n') {
        lineStarts += pos + 1
        line += 1
      }
      in.next()
    }
  }
  def getToken(): Token = {
    val gap = Position(line,column)
    skipWhiteSpace()
    val start = Position(line,column)
    val tok = getRawToken()
    val end = Position(line,column)
    tok.pos = SourceInfo(fileName, gap, start, end)
    tok
  }
}

Todo: what about parser logic for positioned trees?

6.1.3 Semicolon inference

Right now we can parse

val x = 3+4; x*2

But not

val x = 3+4
x*2

What can we do? Accept newlines in places of semicolons.

def skippedNewline =
  in.peek.pos.start.line != in.peek.pos.gap.line

def follows(c: Char) =
  if (in.hasNext(_ == Delim(c))) { in.next(); true }
  else if (c == ';' && skippedNewline) true // <-- new
  else false

We want to be careful though. Consider:

val x = 3+4
-x*2

We would not want to parse this as val x = 3+4-x*2

So we’ll also prevent skipping line breaks in certain places:

def binop(min: Int): Exp = {
  var res = atom()
  while (in.hasNext(isInfixOp(min)) && !skippedNewline) { // <-- new
    val op = name()
    val nextMin = prec(op) + assoc(op) // + 1 for left assoc
    val rhs = binop(nextMin)
    res = Prim(op, List(res, rhs))
  }
  res
}

6.1.4 Comments

As part of skipping whitespace. Another character of lookahead.

def isCommentStart(c1: Char, c2: Char) = c1 == '/' && c2 == '/'

def skipWhiteSpace() = {
  while (in.hasNext(isWhiteSpace) || in.hasNext2(isCommentStart)) { // skip white space
    if (in.peek == '/') {
      in.next()
      while (in.peek != '\n' && in.peek != in.eof)
        in.next()
    }
    ...
  }
}

Extended reader interface:

var peek  = getToken()
var peek1 = getToken() // <-- new
def hasNext: Boolean = peek != EOF
def hasNext(f: Token => Boolean) = f(peek)
def hasNext2(f: (Token,Token) => Boolean) = f(peek,peek1) // <-- new
def next() = try peek finally { peek = peek1; peek1 = getToken(); }

Todo: nested comments /* ... */

6.1.5 Better Error Messages

Implementation:

def input: String = ...

def getLine(i: Int) = {
  val start = lineStarts(i)
  val end = input.indexOf('\n',start) // we may not have seen the end yet
  if (end < 0) input.substring(start) else input.substring(start,end)
}
def showSource(p: SourceInfo) = {
  val width = if (p.end.line == p.start.line) (p.end.col - p.start.col) else 0
  val line1 = getLine(p.start.line)
  val line2 = " "*p.start.col + "^"*(width min 1)
  line1 + '\n' + line2
}

def getRawToken(): Token = {
  if (in.hasNext(isAlpha)) {
    ...
  } else if (...) {
    ...
  } else {
    abort(s"Unexpected character: ${in.peek} at $line,$column\n"+showSource(Position(line,column,line,column,line,column+1)))
  }
}

Result: nice error messages (example)

Todo:error reporter interface

Todo:example

6.1.6 Error Recovery

Skip until sync points. In particular: next statement or closing paren/brace. Key: track indentation in scanner. Skip to closing brace, which should be indented at least as much as current line.

Todo

6.1.7 Indentation-Sensitive Syntax

Pioneered by Python, adopted (to an extent) in Haskell and F#. Has been proposed for Scala, too. Motivation here: if we rely on indentation to determine where braces should go we might just skip the braces altogether.

Todo

6.2 Semantic Analysis

Let’s consider other error cases. What happens if we refer to a variable that hasn’t been defined? Example:

Prim("+", List(Var("x"), Lit(2))) // x + 2, without enclosing val x = ...

Both the interpreter and the compiler fail with an exception.

It’s a bit annoying that the compiler will fail in such a way during code generation. It would be better to have a nice error message.

Of course we could insert proper checking logic into the code generation phase, but as we add more things to check for and as the code generation logic becomes more complex it’s increasingly easy to miss important error cases. Thus it makes sense to separate concerns and extract these checks into a separate phase.

This way, the code generation logic can assume that expressions are all well formed and we can have a policy to treat exceptions in later phases of the compiler as compiler bugs.

6.2.1 Analysis Phase in Code

def check(e: Exp)(implicit env: Set[String]): Unit = e match {
  case Lit(x) =>
  case Var(x) =>
    if (!env.contains(x))
      error(s"Unbound variable $x")
  case Prim(op, xs) =>
    if (!isOperator(op, xs.length))
      error(s"undefined operator $op", exp.pos)
    xs.foreach(check)
  case Let(x,rhs,body) =>
    check(rhs)
    check(body)(env + x)
}

Note: this will be the basis for type checking. Some people would argue that undefined variables are a (context-sensitive) syntactic well-formedness property, though I tend to view name resolution as an aspect of semantics

6.2.2 Soundness Property

Ideally the compiler should report proper error messages for all programs that may cause the corresponding interpreter to fail at runtime.

This is a lofty goal (unless we trivially reject all programs), and not practically feasible for a full language, so one typically accepts certain relatively well-defined errors at runtime. Example: divide by zero. Think about what would be necessary to prevent this kind of error statically in a full language like Scala.

Still, the more errors we can prevent statically the better.

6.2.3 Example

val x = z; ++z
Let("x", Ref("z"), Unary("++", Ref("z")))
  • 3 errors
  • 2 duplicates
  • Finding one error does not require the algorithm to stop

6.3 Lecture Material

6.3.1 Current Grammar (extended from last lecture)

<num>   ::= [0-9]+
<ident> ::= (a-zA-Z)[a-zA-Z0-9]*
<op>    ::= ['+' | '-' | '*' | '/']+
<atom>  ::= <num>
          | <ident>
          | '('<simp>')'
<simp>  ::= <atom>[<op><atom>]*
<exp>   ::= <simp>
          |'val' <ident> '=' <simp>';' <exp>

6.3.2 Quiz

What kind of error(s)?

val x = z; val 1 = 2; x
val x = 1++3; val y = x & 1
val x = 1; val x = 3; x + x
Answer:

 - Syntax error: identifier expected got 1.
   Semantic error: undefined identifier 'z'
 - Syntax error: unexpected character '&'.
   Semantic error: undefined operator '++'
 - We need more information to conclude. No errors based on previous
   lecture's interpreter. We could choose to forbid redefining variables.

6.3.3 Error Handling

  • The parser handles the syntax errors.
Error: identifier expected.
1:16: val x = z; val 1 = 2; x
                     ^
  • The semantic analyzer handles the semantic errors.
Error: undefined reference to z.
1:9: val x = z; val y = 2; x
             ^
1 error found
  • After these two phases, any problem is a compiler bug!

6.3.4 Syntax Error Handling

What to do when we find a syntax error?

Different solutions:

  • Report the error and exit
  • Try to recover and continue

For the project we are going to use the first method. But there are some algorithms which exist for error repair.

6.3.5 Syntax Error Repair

val 1 = 2; 4

After ‘val’ we expect an identifier and find a number. We can raise an error, then create a ‘dummy’ identifier and continue parsing.

Let("error$1", Lit(2), Lit(4))

Other algorithms exist: Burke-Fisher error repair is one of them.

6.4 Where Are We?

We looked a little bit deeper in error handling. We made distinction between syntax errors and semantic errors. We also saw how to make our compiler more human friendly.