Alex Angelopoulos (aka at mvps dot org)
| ! Comment | 
The explanation symbol can be used to denote that line is a comment. This symbol is currently only allowed at the very beginning of the a line.
| ! This is a comment | 
| " ParameterName " = Value | 
| Part | Description | 
|---|---|
| ParameterName | A string containing the name of the parameter | 
| Value | A string containing the new value of the parameter | 
The GOLD Builder has a series of parameters that describe the form and function of your grammar. They are as follows:
| Parameter Name | Type | Description | 
|---|---|---|
| Name | Optional | The name of the grammar. | 
| Version | Optional | The version of the grammar. This can contain any alphanumeric string. | 
| Author | Optional | The grammar's author. | 
| About | Optional | A short description of the grammar. | 
| Case Sensitive | Optional | Whether the grammar is considered to be case sensitive. When this parameter is set "True", the GOLD Builder will construct case sensitive tokenizer tables (DFA). In other words, if your language contains a terminal 'if', the text 'IF', 'If', and 'iF' will cause a syntax error. This parameter defaults to 'False'. | 
| Auto Whitespace | Optional | In the previous version of the GOLD Parser, the whitespace terminal was always created when omitted in the grammar. Unfortunately, not all grammars make use of whitespace. This parameter is set to 'True' by default, but can be changed to 'False'. When 'False', the system will not automatically create a whitespace terminal unless it is manually defined. | 
| Start Symbol | Required | The starting symbol in the grammar. When LALR parse tables are constructed by the GOLD Builder, an "accepted" grammar will reduce to this nonterminal. | 
| "Name"    = My Programming Language | 
| < RuleName > | ::= | [ Symbols ] | 
| [ | | | Symbols ] ... | 
| Part | Description | 
|---|---|
| RuleName | A string specifying the name of the nonterminal the rule derives. | 
| Symbols | A list of 0 or more terminals and nonterminals. | 
Typically, rules in a grammar are declared using BNF (Backus-Noir Form) statements. This notation consists series of 0 or more symbols where nonterminals are delimited by the angle brackets '<' and '>' and terminals are delimited by single quotes or not delimited at all.
For instance, the following declares the common if-statement.
| <Statement> | ::= | if <Expression> then <Statements> end if | 
The symbols 'if', 'then', 'end', and 'if' are terminals and <Expression> and <Statements> are nonterminals.
If you are declaring a series of rules that derive the same nonterminal (i.e. different versions of a rule), you can use a single pipe character '|' in the place of the rule's name and the "::=" symbol. The following declares a series of 3 different rules that define a 'Statement'. In this example, the shortcut notation is used to simply the declaration.
| <Statement> | ::= | if <Expression> then <Statements> end if | 
| | | while <Expression> do <Statements> end while | |
| | | for Id = <Range> loop <Statements> end for | 
This is equivalent to:
| <Statement> | ::= | if <Expression> then <Statements> end if | 
| <Statement> | ::= | while <Expression> do <Statements> end while | 
| <Statement> | ::= | for Id = <Range> loop <Statements> end for | 
| Note: | When text is read by the Builder, all characters delimited by single quotes are analyzed as literal strings. In other words, any text delimited by single quotes is considered to be exactly as printed. This allows you to specify characters that would normally be limited by the notation. For instance, when defining a rule, angle brackets are used to delimit nonterminals. By typing '<' and '>', you can specify these two characters without worrying about the system misinterpreting them. A single quote character can be specified by typing two single quotes ''. | 
There is also an "Enhanced" BNF format with incorporates special notation for optional symbols (either terminals or nonterminals). At this time, the GOLD Builder will only uses the original format. The final build of version 1.0 might incorporate the enhanced format, but this is not yet determined.
The following two rules define a comma delimited list of Identifiers. The use of single quotes to delimit the actual comma are not required.
| <List> | ::= | Identifier ',' <List> | 
| | | Identifier | 
Operator precedence is an important aspect of most programming languages. The following rules define the common arithmetic operators.
| <Expression> | ::= | Identifier '+' <Expression> | 
| | | Identifier '-' <Expression> | |
| | | <Mult Exp> | |
| <Mult Exp> | ::= | Identifier '*' <Mult Exp> | 
| | | Identifier '/' <Mult Exp> | |
| | | Identifier | 
| { SetName } = SetExpression | 
| Part | Description | 
|---|---|
| 
    SetName  | A string specifying the name of the set being declared. | 
| 
    SetExpression | An arithmetic expression containing one or more sets. | 
Literal sets of characters are delimited using the square brackets '[' and ']' and pre-defined sets are delimited by the braces '{' and '}'. For instance, the text "[abcde]" denotes a set of characters consisting of the first five letters of the alphabet; while the text "{abc}" refers to a set named "abc".
Sets can then be declared by adding and subtracting previously declared sets and literal sets. The GOLD Builder provides a collection of pre-defined sets that contain characters often used to define terminals..
| Note: | When text is read by the Builder, all characters delimited by single quotes are analyzed as literal strings. In other words, any text delimited by single quotes is considered to be exactly as printed. This allows you to specify characters that would normally be limited by the notation. For instance, when defining a rule, angle brackets are used to delimit nonterminals. By typing '<' and '>', you can specify these two characters without worrying about the system misinterpreting them. A single quote character can be specified by typing a double single quote ''. | 
| Declaration | Resulting Set | 
|---|---|
| {Bracket} = [']'] | ] | 
| {Quote} = [''] | ' | 
| {Vowels} = [aeiou] | aeiou | 
| {Vowels 2} = {Vowels} + [y] | aeiouy | 
| {Set 1} = [abc] | abc | 
| {Set 2} = {Set 1} + [12] - [c] | ab12 | 
| {Set 3} = {Set 2} + [0123456789] | ab0123456789 | 
The following declares a set named "Hex Char" containing the characters that are valid in a hexadecimal number.
| {Hex Char} = {Digit} + [ABCDEF] | 
The following declares a set containing the characters that can be placed inside a normal "string". In this case, the double quote is the delimiting character (which it is in most programming languages).
| {String Char} = {Printable} - ["] | 
| TerminalName = RegularExpression | 
| Part | Description | 
|---|---|
| 
    TerminalName | A string of characters specifying the name of the terminal being declared. | 
| 
    RegularExpression | A regular expression defining the pattern of the terminal. | 
The notation is rather simple, yet versatile enough to express any terminal needed. Basically, regular expressions consist of a series of characters that define the pattern of the terminal.
Literal sets of characters are delimited using the square brackets '[' and ']' and defined sets are delimited by the braces '{' and '}'. For instance, the text "[abcde]" denotes a set of characters consisting of the first five letters of the alphabet; while the text "{abc}" refers to a set named "abc". Neither of these are part of the "pure" notation for regular expressions, but are widely used in other parser generators such as Lex/Yacc.
Sub-expressions are delimited by normal parenthesis '(' and ')'. The pipe character '|' is used to denote alternate expressions.
Either a set, a sub expression, or a single character can be followed by any of the following three symbols:
| * | Kleene Closure. This symbol denotes 0 or more or the specified character(s) | 
| + | One or more. This symbol denotes 1 or more of the specified character(s) | 
| ? | Optional. This symbol denotes 0 or 1 of the specified character(s) | 
For example, the regular expression ab* 
translates to "an a followed by zero or more b's" and
[abc]+ translates to "an series of one 
or more a's, b's or c's".
| Note: | When text is read by the Builder, all characters delimited by single 
    quotes are analyzed as literal strings. In other words, any text delimited 
    by single quotes is considered to be exactly as printed. This allows 
    you to specify characters that would normally be limited by the notation. 
    For instance, when defining a
    
    rule, angle brackets are used to delimit nonterminals. By typing 
    '<' and '>', you 
    can specify these two characters without worrying about the system 
    misinterpreting them. A single quote character can be specified by typing a 
    double single quote ''. In the case of regular expressions, single quotes allow you to specify the following characters: ? * + ( ) { } [ ] | 
| Declaration | Valid strings | 
|---|---|
| 
    Example1 = abc* | ab, abc, abcc, 
    abccc, abcccc, ... | 
| 
    Example2 = ab?c | abc, ac | 
| 
    Example3 = a|b|c | a, b, c  | 
| 
    Example4 = a[12]*b | ab, a1b, a2b, 
    a12b, a21b, a22b, a111b, ... | 
| 
    Example5 = '*'+ | *, **, ***, 
    ****, ... | 
| 
    Example6 = {Letter}+ | cat, dog, 
    Sacramento, ... | 
| 
    Identifier = 
    {Letter}{AlphaNumeric}* | e4, Param4b, 
    Color2, temp, ... | 
| 
    ListFunction = c[ad]+r | car, cdr, caar, 
    cadr, cdar, cddr, caaar, ... | 
| 
    ListFunction = c(a|d)+r | The same as the above using a different, yet equivalent, regular expression. | 
| 
    NewLine = {CR}{LF}|{CR} | Windows and DOS use {CR}{LF} for newlines, UNIX simply uses {CR}. This definition will detect both. | 
The Whitespace terminal is used by the GOLD Parser to represent information that can ignored by the parsing engine. Normally this is defined as {Whitespace}+
In addition, there are three Comment terminals that are used to define block and line comments.
The GOLD Builder has a collection of useful pre-defined sets at your disposal. These include the sets that are often used for defining terminals as well as characters not accessable via the keyboard. This documentation also includes a Pre-Defined Character Set Chart.
| Set Name | Characters | 
|---|---|
| {HT} | Horizontal Tab character (#09). | 
| {LF} | Line Feed character (#10). | 
| {VT} | Vertical Tab character (#11). This character is rarely used. | 
| {FF} | Form Feed character (#12). This character is also known as "New Page". | 
| {CR} | Carriage Return character (#13). | 
| {Space} | Space character (#32). Techically, this set is not needed since a "space" can be expressed by using single quotes: ' '. The set was added to allow the developer to more explicitly indicate the character and add readability. | 
| {NBSP} | No-Break Space character (#160). The No-Break Space character is used to represent a space where a line break is not allowed. It is often used in source code for indentation. | 
| Set Name | Characters | 
|---|---|
| {Digit} | 0123456789 | 
| {Letter} | abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ | 
| {AlphaNumeric} | This set includes all the characters in {Letter} and {Digit} | 
| {Printable} | This set includes all standard characters that can be printed onscreen. This includes the characters from #32 to #127 and #160 (No-Break Space). The No-Break Space character was included since it is often used in source code. | 
| {Whitespace} | This set includes all characters that are normally considered whitespace and ignored by the parser. The set consists of the Space, Horizontal Tab, Line Feed, Vertical Tab, Form Feed, Carriage Return and No-Break Space. | 
Please see the Pre-Defined Character Set Chart for pictures of these characters.
| Set Name | Characters | 
|---|---|
| {#n} | Using this notation, you can specify characters normally not accessable via the keyboard. In this version of the GOLD Builder, n can be any number between 0 and 255. For instance, {#169} specifies the copyright character ©. | 
| {Letter Extended} | This set includes all the letters which are part of the extended Unicode character set. | 
| {Printable Extended} | This set includes all the printable characters above #127. Although rarely used in programming languages, they could be used, for instance, as valid characters in a string literal. | 
One of the key principles in programming languages is the ability to incorporate comments and other documentation directly to the source code. Whether it is FORTRAN, COBOL or C++, the ability exists, but in varying forms.
Essentially, there are three different types of comment terminals used in programming languages: those that tell the compiler to ignore the remaining text in the current line of code and those used to denote the start and end of a multi-line comment.
To accommodate the intricacies of comments, the GOLD Parser Builder provides for this special class of terminals.
| Comment Start | The Comment Start terminal defines the symbol used to begin a block comment. When the tokenizer engine reads this symbol from the source text, it will increment an internal counter and ignore all other tokens until the Comment End token is encountered. Comments will be nested. | 
| Comment End | The Comment End terminal defines the symbol that will denote the end of a block comment. | 
| Comment Line | Unlike the Comment Start and Comment End terminals, the tokenizer will simply discard the rest of the line. | 
This documentation contains an example on how to use the comment terminals in a grammar.
Below is a comparison of comment terminals in several common programming languages. Blanks fields denote the programming language lacks a terminal of that type. For instance, Visual Basic does not provide block comments.
| Programming Language | Line Comment | Block Comment Start | Block Comment End | 
|---|---|---|---|
| BASIC | REM | ||
| C (Original) | // | ||
| C (ANSI) | // | /* | */ | 
| C++ | // | /* | */ | 
| COBOL | * | ||
| LISP | ; | ||
| FORTRAN 90 | ! | ||
| Java | // | /* | */ | 
| Pascal | { or (* | } or *) | |
| Prolog | % | /* | */ | 
| SQL | -- | /* | */ | 
| Visual Basic | ' (Single quote) or Rem | 
In practically all programming languages, the parser recognizes (and usually ignores) the spaces, new lines, and other meaningless characters that exist between tokens. For instance, in the code
| 
    If  Done Then | 
the fact that there are two spaces between the 'If' and 'Done', a new line after 'Then', and multiple space before 'Counter' is irrelevant.
From the parser's point of view (in particular the Deterministic Finite 
Automata that it uses) these whitespace characters are recognized as a special 
terminal which can be discarded. In GOLD, this terminal is simply 
called the Whitespace terminal and can be defined to whatever is needed. If the 
Whitespace Terminal is not defined explicitly in the grammar outline, it will be 
implicitly declared as one or more of the characters in the
pre-defined Whitespace set:  {Whitespace}+.
Normally, you would not need to worry about the Whitespace terminal unless you are designing a language where the end of a line is significant. This is the case with Visual Basic, BASIC and many, many others. The proper declaration can be seen in an example.