SML Style Guide
As with any large programming exercise, common sense dictates that
programs should be written to faciliate ease of comprehension on the
part of the reader. This means that variables names should be
descriptive, indentation should reflect program structure, and code
should be generously documented. There are a number of syntactic
conventions used in writing SML code; those unfamiliar with these
conventions should use the Ullman book or Harper's notes for
clarification.
Listed below are some stylistic rules and suggestions for SML that might help
you in writing and maintaining code developed for the project.
General Style
80 Column Limit. No line of code may have more than
80 columns.
No Tab Characters. Do not use the tab character
(0x09). Instead, use spaces to control indenting. This is because the width of a
tab is not uniform across all computers, and what looks good on your machine
may look terrible on mine, especially if you have mixed spaces and tabs.
Comments
Comments Go Above the Code They Reference, as in the
following example:
val sum = foldl (op +) 0
Avoid Useless Comments. Avoid comments that merely repeat
the code they reference or state the obvious. Comments should state the invariants, the non-obvious, or
any references that have more information about the code.
Avoid Over-commenting. Very many or very long comments
in the code body are more distracting than helpful. Long comments
should appear at the top of a file
to explain the overall design of the code or refer to any
sources that have more information about the algorithms or data
structures. All other comments in the file should be as short as
possible. A good place for a comment is just before a function declaration.
Judicious choice of variable names can help minimize the need for comments.
Line Breaks. Empty lines should only be included
between value declarations within a struct block, especially
between function declarations. It is not necessary to put empty
lines between other declarations unless you are separating the different
types of declarations (such as structures, types, exceptions and
values). Unless function declarations within a let block
are long, there should be no empty lines within a let block. There
should never be an empty line within an expression.
Multi-line Commenting. When comments are
printed on paper, the reader lacks the advantage of color highlighting
performed by an editor such as Emacs. Multiline comments can be
distinguished from code by preceding each line of the comment with a * similar to the following:
fun complicatedFunction () = ...
Naming and Declarations
Naming Conventions. The best way to tell at a glance something about the type of a variable
is to use the standard SML naming conventions.
The following are the preferred rules that are followed by the SML basis and SML/NJ libraries:
Token |
|
SML
Naming Convention |
|
Example |
Variables |
|
Symbolic or
initial lower case. Use embedded caps for multiword names. |
|
getItem |
Constructors |
|
Initial upper
case. Use embedded caps for multiword names. Historic
exceptions are nil, true, and false.
Rarely are symbolic names like :: used. |
|
Node
EmptyQueue |
Types |
|
All lower
case. Use underscores for multiword names. |
|
priority_queue |
Signatures |
|
All upper
case. Use underscores for multiword names. |
|
PRIORITY_QUEUE |
Structures |
|
Initial upper
case. Use embedded caps for multiword names. |
|
PriorityQueue |
Functors |
|
Same as structure convention, except Fn completes the
name. |
|
PriorityQueueFn |
These conventions are not enforced by the compiler, though
violations of the variable/constructor conventions ought to cause warning
messages because of the danger of a constructor turning into a variable when
it is misspelled.
Use Meaningful Names. Another way of conveying information is to use meaningful variable names that
reflect their intended use. Choose words or combinations of words
describing the value. Variable names may be one letter in short let
blocks. Functions used in a fold, filter,
or map are often bound to the name f. Here is an example for short
variable names:
let
val d = Date.fromTimeLocal(Time.now())
val m = Date.minute d
val s = Date.second d
fun f n = (n mod 3) = 0
in
List.filter f [m,s]
end
Type Annotations. Top-level functions and values
should always be declared with types. Consider the difference bewteen the following:
fun foo x = x+1
fun foo(x:int):int = x+1
Avoid Global Mutable Variables. Mutable values should be local to closures and almost never declared as a structure's
value. Global mutable values cause many problems.
First, it is difficult to ensure that the mutable value is in the proper
state, since it might have been modified outside the
function or by a previous execution of the algorithm. This is
especially problematic with concurrent threads. Second, and more
importantly, having global mutable values makes it more likely that your
code is nonreentrant. Without proper knowledge of the ramifications,
declaring global mutable values can extend beyond bad style to incorrect
code.
When to Rename Variables. You should rarely need
to rename values, in fact this is a sure way to obfuscate code.
Renaming a value should be backed up with a very good reason. One instance
where renaming a variable is common and encouraged is when aliasing structures.
In these cases, other structures used by functions within the current
structure are aliased to one or two letter variables at the top of the struct
block. This serves two purposes: it shortens the name of the structure and
it documents the structures you use. Here is an example:
struct
structure H = HashTable
structure T = TextIO
structure A = Array
...
end
Order of Declarations in a Structure. When
declaring elements in a structure, you should first alias the structures you intend
to use, followed by the types, followed by exceptions, and lastly list all
the value declarations for the structure. Here is an example:
struct
structure L = List
type foo = unit
exception InternalError
fun first list = L.nth(list,0)
end
Every declaration within the structure should be indented the same
amount.
Structures, Signatures, and Functors
Every structure should have a named signature, and all signatures associated
with a given module should be found in a separate *.sig file.
Use opaque signatures to hide details about the underlying structure implementation.
For example,
signature STACK =
sig
type 'a t
val create : int * 'a -> 'a t
val push : 'a * 'a t -> unit
val pop : 'a t -> 'a
end
structure Stack :> STACK =
struct
type 'a t = 'a array
val top = ref ~1
fun create(n,i) = Array.array(n,i)
fun push(n,s) = (top := !top + 1; Array.update(s,!top,n))
fun pop(s) = ...
end
Use functors to share implementations and to provide improved modularity and
type-checking:
functor Memory (structure Stack: STACK): MEMORY =
struct
structure Stack = Stack
type t = ...
fun gc(s:Stack) = ...
end
Indenting
Indent by two or three spaces. Be consistent.
Long expressions can be broken up and the parts
aligned, as in the second example. Either is acceptable.
val x = "Long line..."^
"Another long line."
val x = "Long line..."^
"Another long line."
Case expressions should be indented as follows:
case expr of
pat1 => ...
| pat2 => ...
If expressions should be indented according to one of
the following schemes:
if exp1 then exp2 if exp1 then
else if exp3 then exp4 exp2
else if exp5 then exp6 else exp3
else exp8
if exp1 then exp2 else exp3 if exp1 then exp2
else exp3
Comments should be indented to
the level of the line of code that follows the comment.
Parentheses
Over Parenthesizing. Parentheses have
many semantic purposes in ML, including constructing tuples, grouping
sequences of side-effect expressions, forcing a non-default parse of an
expression, and grouping structures for functor arguments.
Their usage is very different from C or Java. Avoid using
unnecessary
parantheses when their presence makes your code harder to understand.
Case expressions. Wrap case
expressions with parentheses.
This avoids a common error involving
nested case expressions. If the case
expression is already wrapped by a let...in...end
block, you can drop the parentheses.
Alternative Block Styles. Blocks of code such as let...in...end,
struct...end, and sig...end should be
indented as follows. There are several alternative styles to choose
from.
fun foo bar = fun foo bar = fun foo bar = let
let let val p = 4 val p = 4
val p = 4 val q = 38 val q = 38
val q = 38 in in
in bar * (p + q) bar * (p + q)
bar * (p + q) end end
end
Pattern Matching
No Incomplete Pattern Matches. Incomplete pattern
matches are flagged with compiler warnings, which are tantamount to errors
for grading purposes. Thus, if your program exhibits this behavior, the problem
will get no points.
Pattern Match in the Function Arguments When
Possible. Tuples, records and datatypes can be deconstructed using pattern
matching. If you simply deconstruct the function argument before you
do anything useful, it is better to pattern match in the function argument.
Consider these examples:
Bad |
|
Good |
fun f arg1 arg2 = let
val x = #1 arg1
val y = #2 arg1
val z = #1 arg2
in
...
end
|
|
fun f (x,y) (z,_) = ...
|
fun f arg1 = let
val x = #foo arg1
val y = #bar arg1
val baz = #baz arg1
in
...
end
| |
fun f {foo=x, bar=y, baz} = ...
|
Function Arguments Should Not Use Values for Patterns.
You should only deconstruct values with variable names and/or wildcards in
function arguments. If you want to pattern match against a specific
value, use a case expression or an if expression. We
include this rule because there are too many errors that can occur when you
don't do this exactly right. Thus of the following two examples, you
should use the latter:
fun fact 0 = 1
| fact n = n * fact(n-1)
fun fact n =
if n=0 then 1
else n * fact(n-1)
Avoid Unnecessary Projections. Prefer pattern
matching to projections with function arguments or a
value declarations. Using projections is okay as long as it
is infrequent and the meaning is clearly understood from the context.
The above rule shows how to pattern-match in the function arguments.
Here is an example for pattern matching with value declarations.
Bad |
|
Good |
let
val v = someFunction()
val x = #1 v
val y = #2 v
in
x+y
end
|
|
let
val (x,y) = someFunction()
in
x+y
end
|
Combine nested case Expressions. Rather than nest case expressions, you can combine
them by pattern matching against a tuple, provided the tests in the case
expressions are independent. Here is an example:
Bad
let
val d = Date.fromTimeLocal(Time.now())
in
case Date.month d of
Date.Jan => (case Date.day d of
1 => print "Happy New Year"
| _ => ())
| Date.Jul => (case Date.day d of
4 => print "Happy Independence Day"
| _ => ())
| Date.Oct => (case Date.day d of
10 => print "Happy Metric Day"
| _ => ())
end
Good
let
val d = Date.fromTimeLocal(Time.now())
in
case (Date.month d, Date.day d) of
(Date.Jan, 1) => print "Happy New Year"
| (Date.Jul, 4) => print "Happy Independence Day"
| (Date.Oct, 10) => print "Happy Metric Day"
| _ => ()
end
Avoid the use valOf, hd, or tl.
The functions valOf, hd, and tl are used to
deconstruct option types and list types. However, they raise exceptions on
certain inputs. You should avoid these functions altogether. It
is usually easy to achieve the same effect with pattern matching. If you cannot
manage to avoid them, you should handle any exceptions that
they might raise.
Factoring
Avoid breaking expressions over multiple lines. If a
tuple consists of more than two or three elements, you should consider using
a record instead of a tuple. Records have the advantage of placing
each name on a separate line and still looking good. Constructing a
tuple over multiple lines makes for ugly code. Other
expressions that take up multiple lines should be done with a lot of
thought. The best way to transform code that constructs expressions
over multiple lines to something that has good style is to factor the code
using a let expression. Consider the following:
Bad
fun euclid (m:int,n:int) : (int * int * int) =
if n=0
then (b 1, b 0, m)
else (#2 (euclid (n, m mod n)), u - (m div n) *
(euclid (n, m mod n)), #3 (euclid (n, m mod n)))
Better
fun euclid (m:int,n:int) : (int * int * int) =
if n=0
then (b 1, b 0, m)
else (#2 (euclid (n, m mod n)),
u - (m div n) * (euclid (n, m mod n)),
#3 (euclid (n, m mod n)))
Best
fun euclid (m:int,n:int) : (int * int * int) =
if n=0
then (b 1, b 0, m)
else let
val q = m div n
val r = n mod n
val (u,v,g) = euclid (n,r)
in
(v, u-(q*v), g)
end
Do not factor unnecessarily.
Bad
let
val x = TextIO.inputLine TextIO.stdIn
in
case x of
...
end
Good
case TextIO.inputLine TextIO.stdIn of
...
Bad (provided y is not a large expression):
let val x = y*y in x+z end
Good
y*y + z
Verbosity
Don't Rewrite Library Functions. The basis library and
the SML/NJ library have a great number of functions and data structures --
use them! Often students will recode List.filter, List.map,
and similar functions. A more subtle situation for recoding is all the
fold functions. Writing a function that recursively walks down the
list should make vigorous use of List.foldl or List.foldr.
Other data structures often have a folding function; use them whenever they
are available.
Misusing if Expressions. Remember that
the type of the condition in an if expression is bool. In
general, the type of an if expression is 'a, but in the
case that the type is bool, you should not be using if at
all. Consider the following:
Bad |
|
Good |
if
e then true else
false |
|
e |
if
e then false else
true |
|
not e |
if
beta then beta else
false |
|
beta |
if
not e then x else
y |
|
if
e then y else
x |
if
x then true else
y |
|
x orelse
y |
if
x then y else
false |
|
x andalso
y |
if
x then false else
y |
|
not x andalso
y |
if x then
y else true |
|
not x orelse y |
Misusing case Expressions. The case
expression is misused in two common situations. First, case
should never be used in place of an if expression (that's why if
exists). Note the following:
case e of
true => x
| false => y
if e then x else y
The latter is much better. Another situation where if
expressions are preferred over case expressions is as follows:
case e of
c => x
| _ => y
if e=c then x else y
The latter is definitely better. The other misuse is using case
when pattern matching with a val declaration is enough. Consider
the following:
val x = case expr of (y,z) => y
val (x,_) = expr
The latter is better.
Other Common Misuses. Here are some other
common mistakes to watch out for:
Bad |
|
Good |
l::nil |
|
[l] |
l::[] |
|
[l] |
length + 0 |
|
length |
length * 1 |
|
length |
big exp * same big exp |
|
let
val x = big exp in
x*x end |
if
x then f a b c1
else f a b c2 |
|
f a b if
x then c1 else
c2 |
String.compare(x,y)=EQUAL |
|
x=y |
String.compare(x,y)=LESS |
|
x<y |
String.compare(x,y)=GREATER |
|
x>y |
Int.compare(x,y)=EQUAL |
|
x=y |
Int.compare(x,y)=LESS |
|
x<y |
Int.compare(x,y)=GREATER |
|
x>y |
Int.sign(x)=~1 |
|
x<0 |
Int.sign(x)=0 |
|
x=0 |
Int.sign(x)=1 |
|
x>0 |
Don't Rewrap Functions. When passing a function as an argument to another function, don't rewrap the function
unnecessarily. Here's an example:
List.map (fn x => Math.sqrt x) [1.0, 4.0, 9.0, 16.0]
List.map Math.sqrt [1.0, 4.0, 9.0, 16.0]
The latter is better. Another case for rewrapping a function is often
associated with infix binary operators. To prevent rewrapping the binary
operator, use the op keyword as in the following example:
foldl (fn (x,y) => x + y) 0
foldl (op +) 0
The latter is better.
Don't Needlessly Nest let Expressions.
Multiple declarations may occur in the first block of a let...in...end
expression. The bindings are performed sequentially, so you may use a
name bound earlier in the same block. Consider the following:
let
val x = 42
in
let
val y = x + 101
in
x + y
end
end
let
val x = 42
val y = x + 101
in
x + y
end
The latter is better.
Avoid Computing Values Twice. If you compute a
value twice, you're wasting CPU time and making your program ugly. The
best way to avoid computing values twice is to create a let
expression and bind the computed value to a variable name. This has the
added benefit of letting you document the purpose of the value with a name.
|