Skip to content

Coding Standard

Benjamin Kowarsch edited this page Jun 29, 2017 · 41 revisions

Naming Convention

Module Identifiers

Module identifiers always start with a capital letter.

DEFINITION MODULE Foobar;

Multi-word module identifiers are generally in Pascal case.

DEFINITION MODULE FileSystem;

In some cases a module identifier that coincides with a common acronym may be all-uppercase. However, as a general rule of thumb, the definition and declaration of all-uppercase identifiers should preferably be avoided.

DEFINITION MODULE ASCII;

The identifier of a module that follows the module-as-a-type paradigm should be a singular noun, describing the type that the module provides.

DEFINITION MODULE String;

(* interned string library *)

TYPE String; (* OPAQUE *)

The identifier of a module that follows the module-as-a-manager paradigm should generally also be a singular noun, describing the management service the module provides.

DEFINITION MODULE Console;

(* console I/O *)

PROCEDURE Write ( ch : CHAR );

However, in some cases it may be preferable for a module that follows the module-as-a-manager paradigm to use an identifier that is a plural noun, if that better describes its service.

DEFINITION MODULE CompilerOptions;

(* compiler option management *)

The use of single-letter module identifiers is strictly prohibited.

Constant Identifiers

Constant identifiers always start with a capital letter.

CONST Separator = '/';

Multi-word constant identifiers are generally in Pascal case.

CONST BufferSize = 4096;

In some cases a constant identifier that coincides with a common acronym may be all-uppercase. However, as a general rule of thumb, the definition and declaration of all-uppercase identifiers should preferably be avoided.

CONST NUL = CHR(0); (* ASCII NUL *)

The enumerated values of enumeration types are constants and follow the naming convention for constants.

TYPE Status = ( Success, Failure );

The use of single-letter constant identifiers is strictly prohibited.

Type Identifiers

Type identifiers always start with a capital letter.

TYPE Key = CARDINAL;

Multi-word type identifiers are generally in Pascal case.

Type BufferDescriptor = RECORD
  size : CARDINAL;
  buffer : ARRAY [0..BufferSize-1] OF CHAR
END; (* BufferDescriptor *) 

In some cases a type identifier that coincides with a common acronym may be all-uppercase. However, as a general rule of thumb, the definition and declaration of all-uppercase identifiers should preferably be avoided.

TYPE AST; (* Abstract Syntax Tree *)

The identifier of the primary type provided by a module that follows the module-as-a-type paradigm should match the module identifier verbatim. For unqualified use, the module should also provide an alias type whose identifier is derived from the module identifier appending capital letter T.

DEFINITION MODULE String;

(* interned string library *)

TYPE String; (* OPAQUE *)

TYPE StringT = String; (* for unqualified use *)

The use of single-letter identifiers for public types is strictly prohibited.

Variable Identifiers

Variable identifiers always start with a lowercase letter.

VAR ch : CHAR;

Multi-word variable identifiers are always in camel case.

VAR
  sourceIndex, targetIndex : CARDINAL;

The use of single-letter variable identifiers should generally be avoided.

Procedure Identifiers

Procedure identifiers always start with a capital letter.

PROCEDURE Write ( ch : CHAR );

Multi-word procedure identifiers are always in Pascal case.

PROCEDURE WriteString ( string : String );

The use of single-letter identifiers for public procedures is strictly prohibited.

Function Identifiers

Function identifiers generally start with a lowercase letter.

PROCEDURE length ( string : String ) : CARDINAL;

Multi-word function identifiers are generally in camel case.

PROCEDURE isQuotable ( ch : CHAR ) : BOOLEAN;

In rare cases a well known function is always denoted in all-uppercase in the literature. In this case it may be preferable to use the all-uppercase denoter as function identifier. However, as a general rule of thumb, the definition and declaration of all-uppercase identifiers should preferably be avoided.

PROCEDURE FIRST( p : Production ) : TokenSet;
(* returns the FIRST set for production p *)

The use of single-letter identifiers for public functions is strictly prohibited.

Formal Parameter Identifiers

Formal parameter identifiers always start with a lowercase letter.

PROCEDURE Sub ( VAR difference : CARDINAL; minuend, subtrahend : CARDINAL );

Multi-word formal parameter identifiers are always in camel case.

PROCEDURE CopySlice
  ( VAR targetArray : ARRAY OF CHAR;
    sourceArray : ARRAY OF CHAR;
    startIndex, endIndex : CARDINAL );

The use of single-letter formal parameter identifiers should generally be avoided, but is permissible in math functions where n and m denote unsigned integers, i and j denote signed integers and r denotes a real number.

PROCEDURE log2 ( n : CARDINAL ) : CARDINAL;

Formatting

Indentation and Line Wrap

We use two-space indentation and a hard line-wrap at column 80.

WHILE NOT Source.eof(source) AND
  ((next = ASCII.SPACE) OR (next = ASCII.TAB) OR (next = ASCII.NEWLINE)) DO

Vertical Spacing

We make very generous use of empty lines to create a visual separation between code sections that constitute a task or logical unit.

Two empty lines are used after the import section, to separate it visually from the definition/declaration section.

FROM Source IMPORT SourceT;


(* Dependency Graph Type *)

TYPE DepGraph; (* OPAQUE *)

Likewise, two empty lines are used after each definition or declaration section.

(* initial value for incremental hash *)

CONST InitialValue = 0;


(* incremental hash function *)

PROCEDURE valueForNextChar ( hash : Key; ch : CHAR ) : Key;

Two empty lines are also used after each function or procedure definition or declaration.

(* Print newline to the console. *)

PROCEDURE WriteLn;


(* Print a boolean value to the console. *)

PROCEDURE WriteBool ( value : BOOLEAN );

A single empty line is used between a function or procedure header and the local declaration section.

PROCEDURE InitTable;

VAR
  index : CARDINAL;

A single empty line is used between the header and body of a function or procedure declaration.

PROCEDURE WriteLn;

BEGIN
  Terminal.WriteLn
END WriteLn;

A single empty line is also used between the last local declaration section and body of a function or procedure declaration.

PROCEDURE InitTable;

VAR
  index : CARDINAL;

BEGIN
  ...

Horizontal Spacing

As a general rule, there should be exactly one space between any two tokens with the following exceptions:

  • no space before a comma , or semicolon ;
  • no space after an opening bracket [
  • no space before a closing bracket ]
  • no space before and after the opening parenthesis of a function or procedure call
  • no space before the closing parenthesis of a function or procedure call
  • no space between within a designator other than after commas and before and after operators

Comma Spacing

IF matchesArraySlice(string, array, start, end) THEN

Semicolon Spacing

WriteString(str); WriteLn; 

Bracket Spacing

value := arrayA[indexA] + arrayB[indexB];

Designator Spacing

ch := string^.intern^[index];

Module Header Formatting

Sample definition module formatting.

DEFINITION MODULE DefMod;

(* one-line synopsis for this module *)

(* import section *)


(* definitions *)


END DefMod;

Sample implementation module formatting.

IMPLEMENTATION MODULE ImpMod;

(* one-line synopsis for this module *)

(* import section *)


(* private constant, type and variable declarations *)


(* public function and procedure declarations *)


(* ************************************************************************ *
 * Private Operations                                                       *
 * ************************************************************************ *)

(* private function and procedure declarations *)


END ImpMod;

Function/Procedure Header Formatting

PROCEDURE Subtract ( VAR minuend : INTEGER; subtrahend : INTEGER ) : INTEGER;

If the formal parameter list of a function or procedure header exceeds 79 columns, a hard-wrap is used immediately after the function or procedure identifier.

PROCEDURE SubtractWithCarry
  ( VAR minuend : INTEGER; subtrahend : INTEGER; carry : BOOLEAN ) : INTEGER;

If the formal parameter list exceeds 79 columns even after wrapping and indenting behind the function or procedure identifier, a hard wrap is used after each formal parameter. The return type remains on the same line as the last formal parameter.

PROCEDURE matchesArraySlice
  ( string : String;
    VAR (* CONST *) array : ARRAY OF CHAR;
    start, end : CARDINAL ) : BOOLEAN;

IF Statement Formatting

An IF statement should always be hard-wrapped after THEN and before ELSIF, ELSE and END. Further, there should always be a comment (* IF *) following the END and ideally there should be a brief comment following an ELSE.

  IF matchToken(Token.Module) THEN
    lookahead := Lexer.consumeSym(lexer)
  ELSE (* resync *)
    lookahead := skipToMatchTokenOrToken(Token.StdIdent, Token.Semicolon)
  END; (* IF *)

CASE Statement Formatting

A CASE statement should always be hard-wrapped after OF and before each case label, ELSE and END. Further, there should always be a comment (* CASE *) following the END and ideally there should be a brief comment following an ELSE.

  CASE token OF
    Token.DEFINITION :
      ...
  | Token.IMPLEMENTATION :
      ...
  ELSE (* we should never get here *)
    HALT
  END; (* CASE *)

LOOP Statement Formatting

A LOOP statement should always be hard-wrapped after LOOP and before END. Further, there should always be a comment (* LOOP *) following the END.

  LOOP
    ...
  END; (* LOOP *)

WHILE Statement Formatting

A WHILE statement should always be hard-wrapped after DO and before END. Further, there should always be a comment (* WHILE *) following the END.

  WHILE NOT eof(source) DO
    ...
  END; (* WHILE *)

REPEAT Statement Formatting

A REPEAT statement should always be hard-wrapped after REPEAT and before UNTIL.

  REPEAT
    ...
  UNTIL ch = ASCII.NUL;

FOR Statement Formatting

A FOR statement should always be hard-wrapped after DO and before END. Further, there should always be a comment (* FOR *) following the END.

  FOR index := start TO end DO
    ...
  END; (* FOR *)

Commenting

We distinguish four kinds of comments:

  • dialect tags for syntax highlighting
  • module and section headlines
  • specification headings
  • code annotations

Dialect Tags for Syntax Highlighting

Some websites, syntax highlighting utilities and syntax aware editors support Modula-2 dialect tag comments. Every Modula-2 source file should be tagged at the very beginning of a source file. We use tag (*!m2pim*) by default and tag (*!m2iso*) for ISO Modula-2 versions of shim libraries and libraries that use ISO's CAST() function. Note that no spaces are permitted within dialect tag comments.

Module and Section Headlines

Each module should have a headline comment following the module header.

DEFINITION MODULE Pathname;

(* Pathname Parser Interface *)

Ideally, definition and declaration sections should have headline comments.

(* Operations *)

PROCEDURE Write ( ch : CHAR );

Specification Headings

Except in simple modules with few and simple operations whose purpose and semantics are self-evident, each function and procedure should be preceded by a specification comment. A specification comment should have a leading and trailing line composed of '-', it should have a heading with the name of the function or procedure followed by its parameter list without formal types, followed by another separator line composed of '-', followed by a description of the operation the function or procedure provides.

(* ---------------------------------------------------------------------------
 * procedure WriteStr(s)
 * ---------------------------------------------------------------------------
 * Prints the given string to the console.
 * ------------------------------------------------------------------------ *)

PROCEDURE WriteStr ( s : StringT );

The specification heading of a function or procedure whose semantics are sufficiently complex to increase the chance for programmer error should list the pre-, post- and error conditions for the function or procedure.

(* ---------------------------------------------------------------------------
 * procedure matchDigitSeq(source)
 *  matches the input in source to a base-2 digit sequence
 * ---------------------------------------------------------------------------
 * EBNF
 *
 * matchDigitSeq :=
 *   Digit+ ( DigitSep Digit+ )*
 *   ;
 *
 * pre-conditions:
 *  (1) source is the current input source and it must not be NIL.
 *  (2) lookahead of source is a base-2 digit.
 *
 * post-conditions:
 *  (1) lookahead of source is the character immediately following the last
 *      digit of the literal whose first digit was the lookahead of source
 *      upon entry into the procedure.
 *
 * error-conditions:
 *  (1) illegal character encountered
 *       TO DO
 * ------------------------------------------------------------------------ *)

PROCEDURE matchDigitSeq ( s : SourceT ) : CHAR;

Code Annotation

Code annotations are used to clarify what may not be self-evident and to make it easier for a human reader to navigate the source code. There are several frequently encountered scenarios where we use code annotations.

Opaque Type Definitions

In classic Modula-2 an opaque type definition is not self-evident. An annotation comment (*OPAQUE*) should always be added to state that the type definition defines an opaque type.

TYPE String; (* OPAQUE *)

Formal VAR Parameters Not Intended To Be Mutable

In classic Modula-2, a parameter can only be passed by reference using the prefix VAR which permits the procedure or functions to which the parameter is passed to modify the original. When passing larger data structures such as arrays to a function or procedure, we may want to pass by reference but we may not wish the passed in value to be modified. In the absence of a formal CONST parameter in classic Modula-2, we add an annotation (*CONST*) after VAR in the formal parameter list, both in the definition and implementation module in order to remind the reader or maintainer of the code that the function or procedure should not write to this parameter.

PROCEDURE WriteString ( VAR (* CONST *) str : ARRAY OF CHAR );

Synopsis For Statement Sequence

The longer a statement sequence, the more mental load is incurred by a human reader to associate the code with a mental model in the reader's mind. It becomes a classic not being able to see the forest for the trees scenario.

To reduce mental load for the human reader, we make generous use of short annotation comments that provide a brief synopsis for a group of statements that follow the comment and logically belong together.

PROCEDURE pointerType ( VAR astNode : AstT ) : SymbolT;

VAR
  baseType : AstT;
  lookahead := SymbolT;
  
BEGIN
  PARSER_DEBUG_INFO("pointerType");
  
  (* POINTER *)
  lookahead := Lexer.consumeSym(lexer);
  
  (* TO *)
  IF matchToken(Token.To) THEN
    lookahead := Lexer.consumeSym(lexer)
  ELSE (* resync *)
    lookahead := skipToMatchSetOrSet(FIRST(Qualident), FOLLOW(PointerType))
  END; (* IF *)
  
  (* typeIdent *)
  IF matchSet(FIRST(Qualident)) THEN
    lookahead := qualident(baseType)
  ELSE (* resync *)
    lookahead := skipToMatchSet(FOLLOW(PointerType))
  END; (* IF *)
    
  (* build AST node and pass it back in astNode *)
  astNode := AST.NewNode(AstNodeType.PointerType, baseType);
  
  RETURN lookahead
END pointerType;

Facilities To Avoid

Control Codes

The only control code permitted in source text is newline.

!!! Please set your editor to insert SPACEs for TABs !!!

Synonym Symbols

The use of symbols ~, & and <> is strictly prohibited.

  • use NOT instead of ~
  • use AND instead of &
  • use # instead of <>

Octal Literals

The use of octal literals is strictly prohibited.

The built-in CHR() function should be used with a decimal argument instead, ideally with a comment that states the value in 0u prefixed base-16 notation.

CONST DEL = CHR(127); (* 0u7F *)

Base-16 Literals

The use of base-16 literals should be avoided.

Decimal equivalents should be used instead, ideally with a comment that states the value in 0x prefixed base-16 notation.

CONST MaxWeight = 64*1024; (* 0xFFFF *)

Local Modules

The use of local modules is strictly prohibited.

EXPORT lists

The use of EXPORT is strictly prohibited. It is only permissible in definition modules in the long outdated PIM2 dialect which we do not and will not support and in local modules which we do not permit.

Public Global Variables

Wirth's "Programming in Modula-2" recommends that public global variables should only be exported for read-only access or they should be omitted altogether. Unfortunately though, no Modula-2 compiler to date has ever followed this recommendation. Wirth did not even follow it in his own compilers.

We strictly follow this recommendation. Since we have no means to restrict the export of global variables to read-only access, we strictly forbid the use of global variable declarations in definition modules altogether.

Instead of exporting a global variable, export an accessor function that returns its value.

DEFINITION MODULE OSParams;

TYPE NewlineMode = ( LineFeed, Return, LineFeedAndReturn );

PROCEDURE newline : NewlineMode;

Place the global variable within the implementation module where it is hidden from public access.

IMPLEMENTATION MODULE OSParams;

VAR newlineValue : NewlineMode;

PROCEDURE newline : NewlineMode;

BEGIN
  RETURN newlineValue
END newline;

Reading and Writing a FOR Loop Control Variable

PIM does not specify neither whether it is permissible to access a FOR loop control variable outside of the FOR loop's body, whether it is permissible to write to the variable within the body, nor what the value of the variable ought to be when the body has terminated. ISO Modula-2 clarified this by forbidding write access within the loop body and by not relying on the value after the body has terminated.

We follow the ISO semantics. Thus, a FOR loop control variable must not be written to within the loop body and its value must not be read after the loop has terminated.

WITH statement

The WITH statement should be avoided.

Semicolon as Statement Terminator

Within definition and declaration sections, all dialects of Modula-2 treat semicolon ; as a terminator. However, within statement sequences and statements, the semicolon is considered a separator, not a terminator. Depending on the dialect, some compilers treat semicolon as an empty statement. Nevertheless, we strictly use the semicolon within statement sequences and statements as a separator only, never as a terminator.

PROCEDURE PrintSwitch( switch : BOOLEAN ); (* ';' at end of declaration *)

BEGIN
  WriteString("switch is "); (* ';' within statement sequence *)
  IF switch = TRUE THEN
    WriteString("on") (* no ';' at end of statement sequence *)
  ELSE
    WriteString("off") (* no ';' at end of statement sequence *)
  END; (* IF *)
  WriteString("."); (* ';' within statement sequence *)
  WriteLn (* no ';' at end of statement sequence *)
END PrintSwitch; (* ';' at end of declaration *)

Unary Minus before Multi-Term Expressions

PIM is ambiguous about the semantics of the unary minus operator. It is not specified whether a unary minus before a multi-term expression applies to the first term or to the entire expression. When we asked Wirth, he stated that he thought it should be obvious from the grammar that the latter is intended. However, the implementors of the ACK Modula-2 compiler took a strictly mathematical interpretation and apply the unary minus only to the first term.

As a result of this ambiguity, we strictly forbid the use of the unary minus before a multi-term expression. Instead, parentheses must be used to avoid any ambiguity.

single := - term; (* unary minus before single-term expression *)

multiple1 := - (term1 - term2); (* unary minus applied to entire multi-term expression *)

multiple2 := (- term1) - term2; (* unary minus applied to first term of multi-term expression *)

Operators DIV and MOD on Possibly Negative Operands

The use of DIV and MOD with possibly negative operands is not permitted.

There are several different definitions for integer division all of which produce different results when at least one of the operands is negative. Different editions of PIM have used different definitions and there are differences between some PIM editions and ISO. This makes it impossible to write portable code across different Modula-2 implementations when using DIV and MOD where one of the operands may be zero.

Libraries IntMath and LongIntMath provide replacement functions that operate consistently across dialects and implementations.

  • use functions ediv() and emod() for Euclidean integer division.
  • use functions fdiv() and fmod() for floored integer division.
  • use functions tdiv() and tmod() for truncated integer division.

Procedures CAP, INC and DEC

The use of procedures CAP, INC and DEC is not permitted.

  • use Char.toUpper() or Char.ToUpper instead of CAP
  • use value := value + 1 instead of INC
  • use value := value - 1 instead of DEC

Procedures NEW and DISPOSE

PIM does not mandate the provision of procedures NEW and DISPOSE. Their availability cannot be taken for granted. As a result they must be strictly avoided. Procedures ALLOCATE and DEALLOCATE provided by module Storage should be used instead.

Code Requiring a Specific Type Size

Ideally, code that requires a specific size for any given type should be avoided. However, when a module relies on a specific size for any given type, the module must be provided in three versions, one for the type being 16-bit wide, one for the type being 32-bit wide and one for the type being 64-bit wide.

The filenames for the versions of such a library are to be suffixed as follows:

  • the 16-bit version with .16bit.def and .16bit.mod
  • the 32-bit version with .32bit.def and .32bit.mod
  • the 64-bit version with .64bit.def and .64bit.mod

ADDRESS Arithmetic

Arithmetic on values of type ADDRESS is strictly prohibited.

Improperly Named Type Conversion Functions

The use of type conversion functions is strictly limited to CHR(), ORD() and VAL().

Seriously, in classic Modula-2 TRUNC()is a safe type transfer from REAL to INTEGER and FLOAT() is the opposite safe type transfer from INTEGER to REAL, but INTEGER() and REAL() are unsafe type transfers.

In other words, the dangerous operations that should be avoided have the most intuitive names, while the safe operations that should be used preferably have the awkwardly named unintuitive names. This is one of the worst design flaws in the history of computer science !!!

Unsafe Type Transfer, aka Casting

Modula-2 distinguishes between type conversion and type casting. The term conversion is used for safe type transfers provided by built-in functions CHR(), ORD() and VAL() where the compiler guarantees the correctness of the result. The term casting is used for unsafe type transfers where the compiler does not guarantee the correctness of the result. Ideally, the latter operation is to be avoided.

Nevertheless, a module that does use casting must be provided in two versions, one for PIM Modula-2 with PIM casting syntax, and one for ISO Modula-2 using ISO's CAST() function.

The filenames for the ISO version of such a library are to be suffixed as follows:

  • definition modules with .iso.def
  • implementation modules with .iso.mod

Coroutines

The use of coroutines is strictly prohibited.

Language Extensions

The use of language extensions is strictly prohibited. This also includes pragmas.

ISO Modula-2 Specific Syntax

With the exception of the aforementioned CAST() function and its usage rules, the use of any other ISO Modula-2 specific syntax is strictly prohibited.

Libraries Not Part of M2BSK

Any use of libraries other than the Storage module that ships with PIM and ISO Modula-2 compilers is strictly prohibited. Only the libraries that are part of the M2BSK project may be used.

Furthermore, shim libraries Terminal and FileSystem should not be used directly by contributed code. Libraries Console and SimpleFileIO should be used instead.

Clone this wiki locally