The Codex static analysis library
Codex is a library of reusable components that are needed to implement a new static analyzer. The main component of any static analysis based on abstract interpretation are:
Single_value_abstractionwhich abstract unbounded integers\mathbb Z, fixed sized bitvectors (of any arbitrary size), and booleans (Lattices.Quadrivalent);- Abstract
Domains, that perform the basic analysis operations. In the case of non-relational domains, domains are built usingSingle_value_abstraction. - A
Fixpointengine, that iterates analysis operations until everything was computed; - A frontend, that parses the source code and drive the analysis by calling the
Fixpointengine and translating instructions and expressions toDomainsoperations. This depends on the language to be analyzed, so this is not provided by the Codex library; but see Analyzers for analyzers that use the library.
The Codex repository is decomposed in several sub-parts (that are in different directories), whose dependency graph is as follows:
Analyzers
Each analyzer must set up its stack of abstract domain, then translate the analysis of instructions and expressions in calls to abstract domains. See for instance:
- Frama-C/Codex: C code analysis implemented as a Codex-based Frama-C plugin. There, check Codex_register which builds the stack of domain (that you can also observe dynamically using
Tracelog), and C2Codex which interprets the C AST using theOperatorimplemented inside the abstractDomains. - BINSEC/Codex: Binary code analysis implemented as a Codex-based BINSEC plugin. There, the Dba2Codex module which instantiates a stack of
Domains, and handles the translation form BINSEC's DBA intermediate language to CodexOperator. - the While tutorial builds a complete analyzer for a simple while language.
Note that using Tracelog you can observe the recursive calls to the different domains, i.e. trace how the analysis of a source expression is handled by Codex.
Core static analysis modules
Operator: Definition of the concrete semantics of base objects like booleans, integers, bitvectors; and utilities around them (like function symbols).Lattices: Abstraction of a set; lattices representing abstraction of a single value the main way to exchange information in Codex.Single_value_abstraction: Combination ofLatticesabstracting a single value byOperator.Terms: is the symbolic terms that is the target of our SSA-translation by abstract interpretation.Types: AST and parser used by the type-based analysis.Domains: The SSA/value-based domains, as well as the memory domains.
Optional static analysis modules
Fixpoint: A library that helps perform fixpoint computation.Smtbackend: Operates a translation of Terms to an SMT formula.
Utilities
Codex_config: Contains knobs allowing to change the behavior of some abstract domains. We generally avoid doing this, but it is useful for benchmarking.TracelogThe logger. Instantiate it on top of your module, then useLog.debug(fun p -> p ...)where p behaves likeFormat.printf.PatriciaTree: Now a stand-alone submodule, but works specially well for abstract interpretation purposes.Codex_logis the old, deprecated, logger.
Tutorials
Extending Codex to a simple while language
We present a tutorial on a simple imperative while language, and demonstrates how to statically analyze programs written in it using Codex, a modular abstract interpretation library. This also serves as a nice introduction to various codex components (Lattices, Single_value_abstraction, Domains...). It is mostly meant for developers who wish to use and extend Codex.
- Chapter 1: Contains the syntax which is defined with variables, arithmetic, boolean expressions, and control structures (e.g.,
ifandwhile). It also provides a concrete interpreter for thewhilelanguage; - Chapter 2: Describes
LatticesandSingle_value_abstractionwith simpler implementations; - Chapter 3: Contains simple interval abstract domain which is then used to run the analysis on simple while programs;
- Chapter 4: Shows how one can use Codex building blocks to recreate the interval domain from chapter 3.
Using the Codex type system for C and binary analysis
This tutorial provides a user guide for the static analysis based on the dependent nominal type system presented in the OOPLSA 2024 paper and implemented using Codex. It covers all the steps necessary to use our tool to check if a C or machine code program is exempt of spatial memory safety errors, such as null pointer dereferences, buffer overflows, or type confusion errors. It covers in particular:
- How to run the analysis on a C program, how to configure the C analysis, and how to inspect the results.
- How to run the analysis on a binary executable, how to configure the machine code analysis, and how to inspect the results.
- How to specify the types used in a C programs to refine the results of the analysis (which is generally a necessary step to obtain memory safety).
