Module Type_parse_tree

This module contains the abstract syntax tree for the types in our input .typedc files, along with their reference concrete syntax. A .typedc file is simply a list of definitions.

The parser itself is defined in Type_parser.

For each type in the AST, we will describe in mathematical block the corresponding grammar as it should appear in the text file:

  • terminals (i.e. symbols and names that appear verbatim in the .typedc file), will use \texttt{\color{green}green typewriter font}
  • all other fonts are non-terminals. They are usually defined somewhere else in this file.

    • We use \text{``}\mathrm{ident}\text{''} as a non-terminal that matches C-like identifiers (combination of upper case and lowercase letter, underscores _ and digits, that does not start with a digit.)
    • Similarly, \text{``}\mathrm{fun\_ident}\text{''} is a non-terminal for function identifiers. It is an identifier that may also contain @ characters.
  • Terminals or non-terminals that end in ? (not to be confused with the terminal \texttt{\color{green}?}) are optional and may be omitted.

The parser also support C style inline // ... \n and block /* ... */ comments, and accepts any number of whitespace characters between terminals.

This page is meant as a technical reference, for a gentler introduction to the typing language, see Types tutorial.

module Symbol : sig ... end
type unop =
  1. | Not
  2. | UMinus

Unary operators:

    \begin{array}{rcll}
      \sim & \triangleq & \texttt{\color{green}!} & \text{TODO logical or bitwise negation?} \\
           & |          & \texttt{\color{green}-} & \text{unary minus}
    \end{array}
  

TODO: it seems unop is currently not part of the parser

type binop =
  1. | Plus
  2. | Minus
  3. | Mult
  4. | Div
  5. | Mod
  6. | Bitwise_and
  7. | Bitwise_or
  8. | Bitwise_xor
  9. | LShift
  10. | RShift
  11. | Eq
  12. | Diff
  13. | Ge
  14. | Gt
  15. | Le
  16. | Lt
  17. | Logical_and
  18. | Logical_or

Binary operators:

    \begin{array}{rcll}
      \diamond & \triangleq & \texttt{\color{green}+} \;\;|\;\;\texttt{\color{green}-} \;\;|\;\;\texttt{\color{green}*} \;\;|\;\;\texttt{\color{green}/} \;\;|\;\;\texttt{\color{green}\%} & \text{arithmetic operators} \\
               & | & \texttt{\color{green}\&} \;\;|\;\; \texttt{\color{green}|} \;\;|\;\; \texttt{\color{green}\textasciicircum} \;\;|\;\; \texttt{\color{green}<<}  \;\;|\;\; \texttt{\color{green}>>}  & \text{bitwise operators} \\
               & | & \texttt{\color{green}==} \;\;|\;\; \texttt{\color{green}!=} \;\;|\;\; \texttt{\color{green}>=} \;\;|\;\; \texttt{\color{green}>} \;\;|\;\; \texttt{\color{green}<=} \;\;|\;\; \texttt{\color{green}<} & \text{comparison operators} \\
               & | & \texttt{\color{green}\&\&} \;\;|\;\; \texttt{\color{green}|} \texttt{\color{green}|} & \text{logical and, logical or}
    \end{array}
  
type expr =
  1. | Self
  2. | Cst of Z.t
  3. | Var of Symbol.t
  4. | Unary of unop * expr
  5. | Binary of binop * expr * expr

Type expressions \mathrm{exp} are given by:

    \begin{array}{rcll}
      \mathrm{exp} & \triangleq & \texttt{\color{green}self}    & \text{self keyword} \\
               & |          & z                & \text{numeric constant }z \in \mathbb{Z} \\
               & |          & \alpha            & \text{symbolic variable} \\
               & |          & \sim \mathrm{exp}           & \text{unary operation: TODO (this is not implemented)} \\
               & |          & \mathrm{exp} \diamond \mathrm{exp} & \text{binary operation} \\
               & |          & \texttt{\color{green}(} \mathrm{exp} \texttt{\color{green})} & \text{parenthesized expression}
    \end{array}
  

To avoid any ambiguity, the parser only allows only obvious precedences:

  • multiplicative operators (*, / and %) bind more strongly than additive operators (+ and -);
  • both bind less strongly than binary predicates (==, <, ...);
  • binary predicates bind less strongly than && and ||;
  • multiplicative and additive operators are left-associative.
  • we report an error otherwise.

We don't allow chaining binary predicates, e.g. a < b == c, a < b > c, or a || b && c is forbidden.

This technique is inspired by Per Vognsen, who pointed it out in https://www.scattered-thoughts.net/writing/better-operator-precedence/

type ptr_annot =
  1. | Maybe_null
  2. | Non_null
  3. | Null_or_non_null

Pointer annotations:

    \begin{array}{rcll}
      \mathrm{ptr\_annot} & \triangleq & \texttt{\color{green}*}    & \text{Unknown: may or may not be null} \\
                      & |          & \texttt{\color{green}+}    & \text{Non-null} \\
                      & |          & \texttt{\color{green}?}    & \text{null or non null}
    \end{array}
    

Note that, while in C pointers may implicitly represent pointer to arrays, in Codex that is not the case. A pointer points to a single value, to have a pointer to an array, you must explicitly use \tau[n]* or its variants.

TODO clarify semantics of the star, especially relative to ?

type type_name =
  1. | TypeNameStruct of string
  2. | TypeNameUnion of string
  3. | TypeNameEnum of string
  4. | TypeName of string

Think of this like a string, e.g. TypeNameStruct "hello" correspons to "struct hello". We do this because of the different namespaces that exist in C.

    \begin{array}{rcll}
      \mathrm{type\_name} & \triangleq & \texttt{\color{green}struct}\ \mathrm{ident}                    & \text{struct name} \\
                      & |          & \texttt{\color{green}union}\ \mathrm{ident}                    & \text{union name} \\
                      & |          & \texttt{\color{green}enum}\ \mathrm{ident}                    & \text{enum name, TODO this is not implemented in the parser} \\
                      & |          & \mathrm{ident} & \text{custom name (introduced by }\texttt{\color{green}type}\text{ or }\texttt{\color{green}region}\text{)}
    \end{array}
    

Basic type names int, char, long, float, int32_t, uint8_t... are predefined and can be used as in C. Their size depends on the selected architecture (see Parse_ctypes.init). We also have an integer type and explicitly sized word1, word2, word4 and word8 types.

type typ =
  1. | Name of type_name
  2. | Applied of constr * expr list
  3. | Pointer of typ * ptr_annot
  4. | Array of typ * expr
  5. | Struct of (string * typ) list
  6. | Union of (string * typ) list
  7. | Constraint of typ * expr
  8. | Exists of Symbol.t * typ * typ
  9. | Function of typ * typ list

Codex types:

    \begin{array}{rcll}
      \tau & \triangleq & \mathrm{type\_name}                    & \text{named type} \\
           & |          & \mathrm{type\_name}\ \texttt{\color{green}(}\ \mathrm{exp}_1\texttt{\color{green},}\ \cdots\texttt{\color{green},}\ \mathrm{exp}_n\ \texttt{\color{green})} & \text{applied type (parameter instantiation)} \\
           & |          & \tau\; \texttt{\color{green}?} \;\;|\;\; \tau\; \texttt{\color{green}+} \;\;|\;\; \tau\; \texttt{\color{green}*}       & \text{pointer types, see }\mathrm{ptr\_annot} \\
           & |          & \tau{\color{green}\texttt [ }\mathrm{expr}{\color{green}\texttt ] }  & \text{array type} \\
           & |          & \texttt{\color{green}struct \{} \tau_1\ \mathrm{ident}_1\texttt{\color{green};}\; \cdots\texttt{\color{green};}\; \tau_n\ \mathrm{ident}_n\texttt{\color{green};}\ \texttt{\color{green}\}} & \text{struct type, trailing semi-colon required} \\
           & |          & \texttt{\color{green}union \{} \tau_1\ \mathrm{ident}_1\texttt{\color{green};}\; \cdots\texttt{\color{green};}\; \tau_n\ \mathrm{ident}_n\texttt{\color{green};}\ \texttt{\color{green}\}} & \text{union type, trailing semi-colon required} \\
           & |          & \tau \ \texttt{\color{green}with}\ \mathrm{exp} & \text{refinement type: type with a constraint predicate} \\
           & |          & \texttt{\color{green}∃}\;\mathrm{ident} \ \texttt{\color{green}:}\ \tau\ \texttt{\color{green}.}\ \tau \;\;|\;\; \texttt{\color{green}\textbackslash exists}\;\mathrm{ident}\ \texttt{\color{green}:}\ \tau\ \texttt{\color{green}.}\ \tau & \text{existential type, using unicode ∃ (U+2203) or latex command}\\
           & |          & {\color{green}\texttt [ } \tau_1\texttt{\color{green},}\ \cdots \texttt{\color{green},}\ \tau_n {\color{green}\texttt ] }\ \texttt{\color{green}->}\ \tau_{\text r} \;\;|\;\;
           \tau_{\text r}\ \texttt{\color{green}<-}\ \texttt{\color{green}(} \tau_1\texttt{\color{green},}\ \cdots \texttt{\color{green},}\ \tau_n \texttt{\color{green})}  & \text{function type} \\
           & |          & \tau_{\text r}\ \texttt{\color{green}(}\mathrm{ptr\_annot} \texttt{\color{green})}\ \texttt{\color{green}(} \tau_1\texttt{\color{green},}\ \cdots \texttt{\color{green},}\ \tau_n \texttt{\color{green})} & \text{C-like function pointer type} \\
           & |          & {\color{green}\texttt ( }\tau {\color{green}\texttt ) } & \text{parenthesized type}
    \end{array}
    
and constr =
  1. | LambdaAlias of type_name
and constr_def =
  1. | Lambda of string list * typ
type t =
  1. | Type of typ
  2. | Constr of constr_def
  3. | FunDef of {
    1. inline : bool;
    2. pure : bool;
    3. funtyp : typ;
    }
    (*

    funtyp must be a function type: Function, Exists(_,_,Function _) or deeper existentials

    *)
  4. | Global of typ
type definition =
  1. | TypeDefinition of type_name * t
  2. | RegionDefinition of type_name * t
  3. | GlobalDefinition of string * t

Type definitions, these can appear as statements in .typedc files.

    \begin{array}{rcll}
      \mathrm{fun\_param} & \triangleq & \texttt{\color{green}void}    & \text{void parameter} \\
                      & |          & \tau \; \mathrm{ident} & \text{named parameter}
    \end{array}
    
    \begin{array}{rcll}
      \mathrm{fun} & \triangleq & \tau \; \mathrm{fun\_ident}\texttt{\color{green}(}\mathrm{fun\_param}_1\texttt{\color{green},}\ \cdots\texttt{\color{green},}\ \mathrm{fun\_param}_n \texttt{\color{green})}    & \text{Function declaration} \\
               & |          & \texttt{\color{green}∃}\;\mathrm{ident} \ \texttt{\color{green}:}\ \tau\ \texttt{\color{green}.}\ \mathrm{fun} \;\;|\;\; \texttt{\color{green}\textbackslash exists}\;\mathrm{ident}\ \texttt{\color{green}:}\ \tau\ \texttt{\color{green}.}\ \mathrm{fun} & \text{Existential function declaration}
    \end{array}
    
    \begin{array}{rcll}
      \mathrm{def} & \triangleq & \texttt{\color{green}type}\ \mathrm{type\_name}\ \texttt{\color{green}=}\ \tau\texttt{\color{green};} & \text{type definition} \\
               & | & \texttt{\color{green}type}\ \mathrm{type\_name}\ \texttt{\color{green}(}\ \mathrm{ident}_1\texttt{\color{green},}\ \cdots\texttt{\color{green},}\ \mathrm{ident}_n\ \texttt{\color{green})}\ \texttt{\color{green}=}\ \tau\texttt{\color{green};} & \text{parameterized type definition} \\
               & | & \texttt{\color{green}region}\ \mathrm{type\_name}\ \texttt{\color{green}=}\ \tau\texttt{\color{green};} & \text{region definition} \\
               & | & \texttt{\color{green}region}\ \mathrm{type\_name}\ \texttt{\color{green}(}\ \mathrm{ident}_1\texttt{\color{green},}\ \cdots\texttt{\color{green},}\ \mathrm{ident}_n\ \texttt{\color{green})}\ \texttt{\color{green}=}\ \tau\texttt{\color{green};} & \text{parameterized region definition} \\
               & | & \texttt{\color{green}inline}?\ \texttt{\color{green}pure}?\ \mathrm{fun}\texttt{\color{green};} & \text{function declaration, inline and pure are optional} \\
               & | & \texttt{\color{green}struct}\ \mathrm{ident}\ \texttt{\color{green}\{}\ \tau_1\ \mathrm{ident}_1\texttt{\color{green};}\; \cdots\texttt{\color{green};}\; \tau_n\ \mathrm{ident}_n\texttt{\color{green};}\ \texttt{\color{green}\};} & \text{C struct declaration, trailing semi-colon required} \\
               & | & \texttt{\color{green}union}\ \mathrm{ident}\ \texttt{\color{green}\{}\ \tau_1\ \mathrm{ident}_1\texttt{\color{green};}\; \cdots\texttt{\color{green};}\; \tau_n\ \mathrm{ident}_n\texttt{\color{green};}\ \texttt{\color{green}\};} & \text{C union declaration, trailing semi-colon required} \\
               & | & \tau\ \mathrm{ident} \texttt{\color{green};} & \text{global type declaration} \\
    \end{array}
    
module Pretty : sig ... end