Syntax

Care should be taken, when using colons and semicolons in the same sentence, that the reader understands how far the force of each sign carries. —Robert Graves and Alan Hodge

Keywords

array   class   else      for    method   override   ref      to      var
break   def     enum      if     new      record     repeat   type    while
const   do      extends   loop   of       return     then     until

Reserved identifiers

boolean   false   nil
char      int     true

Operators

||   <    <=   +   -   {   }   ;    ,
&&   >    >=   *   /   (   )   :    .
!    ==   !=   ^   %   [   ]   :=   =

Comments

A comment is an arbitrary character sequence opened by /* and closed by */. Comments can be nested and can extend over more than one line.

Conventions for syntax

We use the following notation for defining syntax:

    X Y      X followed by Y
    X|Y      X or Y.  
    [X]      X or empty
    {X}      A possibly empty sequence of X's 
    X&Y      X or Y or X Y
"Followed by" has greater binding power than | or &; parentheses are used to override this precedence rule. Non-terminals begin with an upper-case letter. Terminals are either keywords or quoted operators. The symbols Ident, Number, TextLiteral, and CharLiteral are defined in the token grammar. Each production is terminated by a period.

Compilation unit productions

Compilation = {Decl} [ Block ].

Block       = "{" {Decl} { Stmt } "}"
Decl = const ConstDecl ";"
     | type TypeDecl ";"
     | var VariableDecl ";"
     | def Id Signature ( Block | ";" ).

ConstDecl      = Id [":" Type] "=" ConstExpr.
TypeDecl       = Id "=" Type.
VariableDecl   = IdList (":" Type & ":=" Expr).

Signature      = "(" Formals ")" [":" Type].
Formals        = [ Formal {";" Formal} [";"] ].
Formal         = [var] IdList ":" Type.

Statement productions

Stmt = AssignSt | Block | CallSt | BreakSt | ForSt 
     | IfSt | LoopSt | RepeatSt | ReturnSt | WhileSt.

AssignSt = Expr ":=" Expr ";".
CallSt   = Expr "(" [Actual {"," Actual}] ")" ";".
BreakSt  = break ";".
ForSt    = for Id ":=" Expr to Expr do Stmt.
IfSt     = if Expr then Stmt [ else Stmt ].
LoopSt   = loop Stmt.
RepeatSt = repeat Stmt until Expr ";".
ReturnSt = return [Expr] ";".
WhileSt  = while Expr do Stmt.

Actual   = Type | Expr .

Type productions

Type = TypeName | ArrayType | EnumType | RecordType | ObjectType | RefType.

ArrayType     = array [ "[" Expr "]" ] of Type.
EnumType      = enum "{" [ IdList ] "}.
RecordType    = record "{" Fields "}".
ObjectType    = class [ extends Type ] "{" Members "}".
RefType       = ref Type.

Fields    = [ Field {";" Field} [ ";" ] ].
Field     = IdList ":" Type.
Members   = [ Member {";" Member} [ ";" ] ].
Member    = Field | Method | Override.
Method    = Id Signature [":=" ConstExpr].
Override  = Id ":=" ConstExpr .

Expression productions

ConstExpr = Expr.

Expr = E1 {"||" E1}.
  E1 = E2 {"&&" E2}.
  E2 = {"!"} E3.
  E3 = E4 {Relop E4}.
  E4 = E5 {Addop E5}.
  E5 = E6 {Mulop E6}.
  E6 = {"+" | "-"} E7.
  E7 = E8 {Selector}.
  E8 = Id | Number | CharLiteral | TextLiteral | "(" Expr ")" | new Type.

Relop =  "==" | "!=" | "<"  | "<=" | ">" | ">=".
Addop =  "+" | "-".
Mulop =  "*" | "/" | "%".

Selector = "^"  |  "." Id  |  "[" Expr "]"
         | "(" [ Actual {"," Actual} ] ")".

Miscellaneous productions

IdList      =  Id {"," Id}.
TypeName    =  Id.

Token productions

To read a token, first skip all blanks, tabs, newlines, carriage returns, vertical tabs, form feeds, comments, and pragmas. Then read the longest sequence of characters that forms an operator or an Id or Literal.

An Id is a case-significant sequence of letters, digits, and underscores that begins with a letter. An Id is a keyword if it appears in the list of keywords, a reserved identifier if it appears in the list of reserved identifiers, and an ordinary identifier otherwise.

In the following grammar, terminals are characters surrounded by double-quotes and the terminal "\"" represents double-quote itself.

Id = Letter {Letter | Digit | "_"}.

Literal = Number | CharLiteral | TextLiteral.

CharLiteral = "'"  (PrintingChar | Escape | "\"") "'".

TextLiteral = "\"" {PrintingChar | Escape | "'"} "\"".

Escape = "\" "n"   | "\" "t"     | "\" "r"     | "\" "f"
       | "\" "\"   | "\" "'"     | "\" "\""
       | "\" OctalDigit OctalDigit OctalDigit.

Number = Digit {Digit}
       | Digit {Digit} "_" HexDigit {HexDigit}.

PrintingChar = Letter | Digit | OtherChar.

HexDigit = Digit | "A" | "B" | "C" | "D" | "E" | "F"
                 | "a" | "b" | "c" | "d" | "e" | "f".

Digit = "0" | "1" | ... | "9".

OctalDigit = "0" | "1" | ... | "7".

Letter = "A"  | "B"  | ... | "Z"  | "a"  | "b"  | ... | "z".

OtherChar = " " | "!" | "#" | "$" | "%" | "&" | "(" | ")"
          | "*" | "+" | "," | "-" | "." | "/" | ":" | ";"
          | "<" | "=" | ">" | "?" | "@" | "[" | "]" | "^"
          | "_" | "`" | "{" | "|" | "}" | "~"
          | ExtendedChar

ExtendedChar = any char with ISO-Latin-1 code in [8_240..8_377].