Module Type_parse_tree
This module contains the abstract syntax tree for the types in our input .typedc files, along with their reference concrete syntax. A .typedc file is simply a list of definitions.
The parser itself is defined in Type_parser.
For each type in the AST, we will describe in mathematical block the corresponding grammar as it should appear in the text file:
- terminals (i.e. symbols and names that appear verbatim in the
.typedcfile), will use\texttt{\color{green}green typewriter font} all other fonts are non-terminals. They are usually defined somewhere else in this file.
- We use
\text{``}\mathrm{ident}\text{''}as a non-terminal that matches C-like identifiers (combination of upper case and lowercase letter, underscores_and digits, that does not start with a digit.) - Similarly,
\text{``}\mathrm{fun\_ident}\text{''}is a non-terminal for function identifiers. It is an identifier that may also contain@characters.
- We use
- Terminals or non-terminals that end in
?(not to be confused with the terminal\texttt{\color{green}?}) are optional and may be omitted.
The parser also support C style inline // ... \n and block /* ... */ comments, and accepts any number of whitespace characters between terminals.
This page is meant as a technical reference, for a gentler introduction to the typing language, see Types tutorial.
module Symbol : sig ... endUnary operators:
\begin{array}{rcll}
\sim & \triangleq & \texttt{\color{green}!} & \text{TODO logical or bitwise negation?} \\
& | & \texttt{\color{green}-} & \text{unary minus}
\end{array}
TODO: it seems unop is currently not part of the parser
Binary operators:
\begin{array}{rcll}
\diamond & \triangleq & \texttt{\color{green}+} \;\;|\;\;\texttt{\color{green}-} \;\;|\;\;\texttt{\color{green}*} \;\;|\;\;\texttt{\color{green}/} \;\;|\;\;\texttt{\color{green}\%} & \text{arithmetic operators} \\
& | & \texttt{\color{green}\&} \;\;|\;\; \texttt{\color{green}|} \;\;|\;\; \texttt{\color{green}\textasciicircum} \;\;|\;\; \texttt{\color{green}<<} \;\;|\;\; \texttt{\color{green}>>} & \text{bitwise operators} \\
& | & \texttt{\color{green}==} \;\;|\;\; \texttt{\color{green}!=} \;\;|\;\; \texttt{\color{green}>=} \;\;|\;\; \texttt{\color{green}>} \;\;|\;\; \texttt{\color{green}<=} \;\;|\;\; \texttt{\color{green}<} & \text{comparison operators} \\
& | & \texttt{\color{green}\&\&} \;\;|\;\; \texttt{\color{green}|} \texttt{\color{green}|} & \text{logical and, logical or}
\end{array}
Type expressions \mathrm{exp} are given by:
\begin{array}{rcll}
\mathrm{exp} & \triangleq & \texttt{\color{green}self} & \text{self keyword} \\
& | & z & \text{numeric constant }z \in \mathbb{Z} \\
& | & \alpha & \text{symbolic variable} \\
& | & \sim \mathrm{exp} & \text{unary operation: TODO (this is not implemented)} \\
& | & \mathrm{exp} \diamond \mathrm{exp} & \text{binary operation} \\
& | & \texttt{\color{green}(} \mathrm{exp} \texttt{\color{green})} & \text{parenthesized expression}
\end{array}
To avoid any ambiguity, the parser only allows only obvious precedences:
- multiplicative operators (
*,/and%) bind more strongly than additive operators (+and-); - both bind less strongly than binary predicates (
==,<, ...); - binary predicates bind less strongly than
&&and||; - multiplicative and additive operators are left-associative.
- we report an error otherwise.
We don't allow chaining binary predicates, e.g. a < b == c, a < b > c, or a || b && c is forbidden.
This technique is inspired by Per Vognsen, who pointed it out in https://www.scattered-thoughts.net/writing/better-operator-precedence/
Pointer annotations:
\begin{array}{rcll}
\mathrm{ptr\_annot} & \triangleq & \texttt{\color{green}*} & \text{Unknown: may or may not be null} \\
& | & \texttt{\color{green}+} & \text{Non-null} \\
& | & \texttt{\color{green}?} & \text{null or non null}
\end{array}
Note that, while in C pointers may implicitly represent pointer to arrays, in Codex that is not the case. A pointer points to a single value, to have a pointer to an array, you must explicitly use \tau[n]* or its variants.
TODO clarify semantics of the star, especially relative to ?
Think of this like a string, e.g. TypeNameStruct "hello" correspons to "struct hello". We do this because of the different namespaces that exist in C.
\begin{array}{rcll}
\mathrm{type\_name} & \triangleq & \texttt{\color{green}struct}\ \mathrm{ident} & \text{struct name} \\
& | & \texttt{\color{green}union}\ \mathrm{ident} & \text{union name} \\
& | & \texttt{\color{green}enum}\ \mathrm{ident} & \text{enum name, TODO this is not implemented in the parser} \\
& | & \mathrm{ident} & \text{custom name (introduced by }\texttt{\color{green}type}\text{ or }\texttt{\color{green}region}\text{)}
\end{array}
Basic type names int, char, long, float, int32_t, uint8_t... are predefined and can be used as in C. Their size depends on the selected architecture (see Parse_ctypes.init). We also have an integer type and explicitly sized word1, word2, word4 and word8 types.
Codex types:
\begin{array}{rcll}
\tau & \triangleq & \mathrm{type\_name} & \text{named type} \\
& | & \mathrm{type\_name}\ \texttt{\color{green}(}\ \mathrm{exp}_1\texttt{\color{green},}\ \cdots\texttt{\color{green},}\ \mathrm{exp}_n\ \texttt{\color{green})} & \text{applied type (parameter instantiation)} \\
& | & \tau\; \texttt{\color{green}?} \;\;|\;\; \tau\; \texttt{\color{green}+} \;\;|\;\; \tau\; \texttt{\color{green}*} & \text{pointer types, see }\mathrm{ptr\_annot} \\
& | & \tau{\color{green}\texttt [ }\mathrm{expr}{\color{green}\texttt ] } & \text{array type} \\
& | & \texttt{\color{green}struct \{} \tau_1\ \mathrm{ident}_1\texttt{\color{green};}\; \cdots\texttt{\color{green};}\; \tau_n\ \mathrm{ident}_n\texttt{\color{green};}\ \texttt{\color{green}\}} & \text{struct type, trailing semi-colon required} \\
& | & \texttt{\color{green}union \{} \tau_1\ \mathrm{ident}_1\texttt{\color{green};}\; \cdots\texttt{\color{green};}\; \tau_n\ \mathrm{ident}_n\texttt{\color{green};}\ \texttt{\color{green}\}} & \text{union type, trailing semi-colon required} \\
& | & \tau \ \texttt{\color{green}with}\ \mathrm{exp} & \text{refinement type: type with a constraint predicate} \\
& | & \texttt{\color{green}∃}\;\mathrm{ident} \ \texttt{\color{green}:}\ \tau\ \texttt{\color{green}.}\ \tau \;\;|\;\; \texttt{\color{green}\textbackslash exists}\;\mathrm{ident}\ \texttt{\color{green}:}\ \tau\ \texttt{\color{green}.}\ \tau & \text{existential type, using unicode ∃ (U+2203) or latex command}\\
& | & {\color{green}\texttt [ } \tau_1\texttt{\color{green},}\ \cdots \texttt{\color{green},}\ \tau_n {\color{green}\texttt ] }\ \texttt{\color{green}->}\ \tau_{\text r} \;\;|\;\;
\tau_{\text r}\ \texttt{\color{green}<-}\ \texttt{\color{green}(} \tau_1\texttt{\color{green},}\ \cdots \texttt{\color{green},}\ \tau_n \texttt{\color{green})} & \text{function type} \\
& | & \tau_{\text r}\ \texttt{\color{green}(}\mathrm{ptr\_annot} \texttt{\color{green})}\ \texttt{\color{green}(} \tau_1\texttt{\color{green},}\ \cdots \texttt{\color{green},}\ \tau_n \texttt{\color{green})} & \text{C-like function pointer type} \\
& | & {\color{green}\texttt ( }\tau {\color{green}\texttt ) } & \text{parenthesized type}
\end{array}
type t = | Type of typ| Constr of constr_def| FunDef of {inline : bool;pure : bool;funtyp : typ;
}| Global of typ
Type definitions, these can appear as statements in .typedc files.
\begin{array}{rcll}
\mathrm{fun\_param} & \triangleq & \texttt{\color{green}void} & \text{void parameter} \\
& | & \tau \; \mathrm{ident} & \text{named parameter}
\end{array}
\begin{array}{rcll}
\mathrm{fun} & \triangleq & \tau \; \mathrm{fun\_ident}\texttt{\color{green}(}\mathrm{fun\_param}_1\texttt{\color{green},}\ \cdots\texttt{\color{green},}\ \mathrm{fun\_param}_n \texttt{\color{green})} & \text{Function declaration} \\
& | & \texttt{\color{green}∃}\;\mathrm{ident} \ \texttt{\color{green}:}\ \tau\ \texttt{\color{green}.}\ \mathrm{fun} \;\;|\;\; \texttt{\color{green}\textbackslash exists}\;\mathrm{ident}\ \texttt{\color{green}:}\ \tau\ \texttt{\color{green}.}\ \mathrm{fun} & \text{Existential function declaration}
\end{array}
\begin{array}{rcll}
\mathrm{def} & \triangleq & \texttt{\color{green}type}\ \mathrm{type\_name}\ \texttt{\color{green}=}\ \tau\texttt{\color{green};} & \text{type definition} \\
& | & \texttt{\color{green}type}\ \mathrm{type\_name}\ \texttt{\color{green}(}\ \mathrm{ident}_1\texttt{\color{green},}\ \cdots\texttt{\color{green},}\ \mathrm{ident}_n\ \texttt{\color{green})}\ \texttt{\color{green}=}\ \tau\texttt{\color{green};} & \text{parameterized type definition} \\
& | & \texttt{\color{green}region}\ \mathrm{type\_name}\ \texttt{\color{green}=}\ \tau\texttt{\color{green};} & \text{region definition} \\
& | & \texttt{\color{green}region}\ \mathrm{type\_name}\ \texttt{\color{green}(}\ \mathrm{ident}_1\texttt{\color{green},}\ \cdots\texttt{\color{green},}\ \mathrm{ident}_n\ \texttt{\color{green})}\ \texttt{\color{green}=}\ \tau\texttt{\color{green};} & \text{parameterized region definition} \\
& | & \texttt{\color{green}inline}?\ \texttt{\color{green}pure}?\ \mathrm{fun}\texttt{\color{green};} & \text{function declaration, inline and pure are optional} \\
& | & \texttt{\color{green}struct}\ \mathrm{ident}\ \texttt{\color{green}\{}\ \tau_1\ \mathrm{ident}_1\texttt{\color{green};}\; \cdots\texttt{\color{green};}\; \tau_n\ \mathrm{ident}_n\texttt{\color{green};}\ \texttt{\color{green}\};} & \text{C struct declaration, trailing semi-colon required} \\
& | & \texttt{\color{green}union}\ \mathrm{ident}\ \texttt{\color{green}\{}\ \tau_1\ \mathrm{ident}_1\texttt{\color{green};}\; \cdots\texttt{\color{green};}\; \tau_n\ \mathrm{ident}_n\texttt{\color{green};}\ \texttt{\color{green}\};} & \text{C union declaration, trailing semi-colon required} \\
& | & \tau\ \mathrm{ident} \texttt{\color{green};} & \text{global type declaration} \\
\end{array}
module Pretty : sig ... end