#
Patricia Tree API - `PatriciaTree`

Association maps from key to values, and sets, implemented with Patricia Trees, allowing fast merge operations by making use of physical equality between subtrees; and custom implementation of tree nodes (allowing normal maps, hash-consed maps, weak key or value maps, sets, custom maps, etc.)

This is similar to OCaml's Map, except that:

- The required signature for keys is different, in that we require each key to be mapped to a unique integer identifier.
The implementation uses Patricia Tree, as described in Okasaki and Gill's 1998 paper

*Fast mergeable integer maps*, i.e. it is a space-efficient prefix trie over the big-endian representation of the key's integer identifier.Example of a 5-bit patricia tree containing five numbers: 0

`0b0000`

, 1`0b0001`

, 5`0b0101`

and 7`0b0111`

and -8`0b1111`

:Branch (prefix=0b?___) / \ Branch Leaf(-8) (prefix=0b0?__) 0b1111 / \ Branch Branch (prefix=0b000?) (prefix=0b01?_) | | | | Leaf(0) Leaf(1) Leaf(5) Leaf(7) 0b0000 0b0001 0b0101 0b0111

The main benefit of Patricia Tree is that their representation is stable (contrary to maps, inserting nodes in any order will return the same shape), which allows different versions of a map to share more subtrees in memory, and the operations over two maps to benefit from this sharing. The functions in this library attempt to maximally preserve sharing and benefit from sharing, allowing very important improvements in complexity and running time when combining maps or sets is a frequent operation.

- Finally, the implementation is more customizable, allowing notably (key,value) pairs or different types to be in the same map, or to choose the memory representation of the nodes of the tree.
- Some operations like
`pop_unsigned_minimum`

and`pop_unsigned_maximum`

make our Set suitable as priority queue (but remember that each element in the queue must map to a distinct integer, and that using the unsigned order means elements with negative priority are seen as greater than elements with positive ones).

Note on complexity: in the following, n represents the size of the map when there is one (and `|map1|`

is the number of elements in `map1`

). The term log(n) correspond to the maximum height of the tree, which is log(n) if we assume an even distribution of numbers in the map (e.g. random distribution, or integers chosen contiguously using a counter). The worst-case height is O(min(n,64)) which is actually constant, but not really informative; log(n) corresponds to the real complexity in usual distributions.

All integers comparisons in this library are done according to their **unsigned representation**. This is the same as signed comparison for same sign integers, but all negative integers are greater than the positives. This means `-1`

is the greatest possible number, and `0`

is the smallest.

```
# unsigned_lt 2 (-1);;
- : bool = true
# unsigned_lt max_int min_int;;
- : bool = true
# unsigned_lt 3 2;;
- : bool = false
# unsigned_lt 2 3;;
- : bool = true
# unsigned_lt (-2) (-3);;
- : bool = false
# unsigned_lt (-4) (-3);;
- : bool = true
# unsigned_lt 0 0;;
- : bool = false
```

Using this unsigned order helps avoid a bug described in *QuickChecking Patricia Trees* by Jan Mitgaard.

Private type used to represent prefix stored in nodes. These are integers with all bits after branching bit (included) set to zero

## Nodes

`module type NODE = sig ... end`

This module explains how a node is stored in memory, with functions to create and view nodes.

`module type NODE_WITH_ID = sig ... end`

Associate a unique number to each node, so they can be used as keys in sets or maps.

`module type HASH_CONSED_NODE = sig ... end`

Hash-consed nodes also associate a unique number to each node, Unlike `NODE_WITH_ID`

, they also check before instanciating the node whether a similar node already exists. This results in slightly slower constructors (they perform an extra hash-table lookup), but allows for constant time equality and comparison.

## Map signatures

### Base map

`module type BASE_MAP = sig ... end`

Base map signature: a generic `'b map`

storing bindings of `'a key`

to `('a,'b) values`

. All maps and set are a variation of this type, sometimes with a simplified interface.

### Heterogeneous maps and sets

Maps and sets with generic keys `'a key`

and values `('a,'b) value`

`module type HETEROGENEOUS_MAP = sig ... end`

This is the same as `MAP`

, but with simple type `key`

being replaced by type constructor `'a key`

and `'b value`

being replaced by `('a,'b) value`

.

`module type HETEROGENEOUS_SET = sig ... end`

A set containing different keys, very similar to `SET`

, but with simple type `elt`

being replaced by type constructor `'a elt`

.

### Homogeneous maps and sets

Same as above, but simple interfaces for non-generic keys. These are also close to the standard library's interface for sets and maps.

`module type SET = sig ... end`

Signature for sets implemented using Patricia trees. Most of this interface should be shared with `Stdlib.Set.S`

.

The typechecker struggles with forall quantification on values if they don't depend on the first parameter, this wrapping allows our code to pass typechecking by forbidding overly eager simplification. Since the type is unboxed, it doesn't introduce any performance overhead.

This is due to a bug in the typechecker, more info on the OCaml discourse post.

`module type MAP_WITH_VALUE = sig ... end`

The signature for maps with a single type for keys and values, a `'a map`

binds `key`

to `'a value`

. This is slightly more generic than `MAP`

, which just binds to `'a`

. It is used for maps that need to restrict their value type, namely Hash-consed maps and sets.

`module type MAP = MAP_WITH_VALUE with type 'a value = 'a`

The signature for maps with a single type for keys and values, a `'a map`

binds `key`

to `'a`

. Most of this interface should be shared with `Stdlib.Map.S`

.

## Keys

Keys are the functor arguments used to build the maps.

`module type KEY = sig ... end`

The signature of homogeneous keys (non-generic, unparameterized keys).

To have heterogeneous keys, we must define a polymorphic equality function. Like in the homogeneous case, it should have the requirement that `(to_int a) = (to_int b) ==> polyeq a b = Eq`

.

`module type HETEROGENEOUS_KEY = sig ... end`

The signature of heterogeneous keys.

## Values

`module type VALUE = sig ... end`

Module type used for specifying custom homogeneous value types in `MakeCustomMap`

. For most purposes, use the provided `Value`

implementation. It sets `'a t = 'a`

, which is the desired effect (maps can map to any value). This is the case in `MakeMap`

. However, for maps like Hash-consed maps and sets, it can be useful to restrict the type of values in order to implement `hash`

and `polyeq`

functions on values. See the `HASHED_VALUE`

module type for more details.

`module type HETEROGENEOUS_VALUE = sig ... end`

The module type of values, which can be heterogeneous. This can be used to specify how the type of the value depends on that of the key. If the value doesn't depend on the key type, you can use the provided default implementations `HomogeneousValue`

and `WrappedHomogeneousValue`

.

`module HomogeneousValue : HETEROGENEOUS_VALUE with type ('a, 'map) t = 'map`

Default implementation of `HETEROGENEOUS_VALUE`

, to use when the type of the value in a heterogeneous map does not depend on the type of the key, only on the type of the map.

```
module WrappedHomogeneousValue :
HETEROGENEOUS_VALUE with type ('a, 'map) t = ('a, 'map) snd
```

Same as `HomogeneousValue`

, but uses a wrapper (unboxed) type instead of direct equality. This avoids a problem in the typechecker with overly eager simplification of aliases. More info on the OCaml discourse post.

`module type HASHED_VALUE = sig ... end`

`VALUE`

parameter for Hash-consed maps and sets, as hash-consing requires hashing and comparing values.

`module type HETEROGENEOUS_HASHED_VALUE = sig ... end`

In order to build Hash-consed maps and sets, we need to be able to hash and compare values.

`module HashedValue : HASHED_VALUE with type 'a t = 'a`

Generic implementation of `HASHED_VALUE`

. Uses `Hashtbl.hash`

for hashing and physical equality for equality. Note that this may lead to maps of different types having the same identifier (`MakeHashconsedMap.to_int`

), see the documentation of `HASHED_VALUE.polyeq`

for details on this.

```
module HeterogeneousHashedValue :
HETEROGENEOUS_HASHED_VALUE with type ('k, 'm) t = 'm
```

Generic implementation of `HETEROGENEOUS_HASHED_VALUE`

. Uses `Hashtbl.hash`

for hashing and physical equality for equality. Note that this may lead to maps of different types having the same identifier (`MakeHashconsedHeterogeneousMap.to_int`

), see the documentation of `HASHED_VALUE.polyeq`

for details on this.

## Functors

This section presents the functors which can be used to build patricia tree maps and sets.

### Homogeneous maps and sets

These are homogeneous maps and set, their keys/elements are a single non-generic type, just like the standard library's `Map`

and `Set`

modules.

### Heterogeneous maps and sets

Heterogeneous maps are `'map map`

, which store bindings of `'key key`

to `('key, 'map) value`

, where `'key key`

is a GADT, as we must be able to compare keys of different types together.

Similarly, heterogeneous sets store sets of `'key key`

.

```
module MakeHeterogeneousSet
(Key : HETEROGENEOUS_KEY) :
HETEROGENEOUS_SET with type 'a elt = 'a Key.t
```

A set containing different keys, very similar to `SET`

, but with simple type `elt`

being replaced by type constructor `'a elt`

.

```
module MakeHeterogeneousMap
(Key : HETEROGENEOUS_KEY)
(Value : HETEROGENEOUS_VALUE) :
HETEROGENEOUS_MAP
with type 'a key = 'a Key.t
and type ('k, 'm) value = ('k, 'm) Value.t
```

This is the same as `MAP`

, but with simple type `key`

being replaced by type constructor `'a key`

and `'b value`

being replaced by `('a,'b) value`

.

### Maps and sets with custom nodes

We can also customize the representation and creation of nodes, to gain space or time.

Possibitities include having weak key and/or values, hash-consing, giving unique number to nodes or keeping them in sync with the disk, lazy evaluation and/or caching, adding size information for constant time `cardinal`

functions, etc.

See Some implementations of NODE for the provided implementations of `NODE`

, or create your own.

```
module MakeCustomMap
(Key : KEY)
(Value : VALUE)
(Node :
NODE
with type 'a key = Key.t
and type ('key, 'map) value = ('key, 'map Value.t) snd) :
MAP_WITH_VALUE
with type key = Key.t
and type 'm value = 'm Value.t
and type 'm t = 'm Node.t
```

Create a homogeneous map with a custom `NODE`

. Also allows customizing the map values

```
module MakeCustomSet
(Key : KEY)
(Node : NODE with type 'a key = Key.t and type ('key, 'map) value = unit) :
SET with type elt = Key.t and type 'a BaseMap.t = 'a Node.t
```

Create a homogeneous set with a custom `NODE`

.

```
module MakeCustomHeterogeneousMap
(Key : HETEROGENEOUS_KEY)
(Value : HETEROGENEOUS_VALUE)
(Node :
NODE
with type 'a key = 'a Key.t
and type ('key, 'map) value = ('key, 'map) Value.t) :
HETEROGENEOUS_MAP
with type 'a key = 'a Key.t
and type ('k, 'm) value = ('k, 'm) Value.t
and type 'm t = 'm Node.t
```

Create an heterogeneous map with a custom `NODE`

.

```
module MakeCustomHeterogeneousSet
(Key : HETEROGENEOUS_KEY)
(NODE : NODE with type 'a key = 'a Key.t and type ('key, 'map) value = unit) :
HETEROGENEOUS_SET
with type 'a elt = 'a Key.t
and type 'a BaseMap.t = 'a NODE.t
```

Create an heterogeneous set with a custom `NODE`

.

### Hash-consed maps and sets

Hash-consed maps and sets uniquely number each of their nodes. Upon creation, they check whether a similar node has been created before, if so they return it, else they return a new node with a new number. With this unique numbering:

`equal`

and`compare`

become constant time operations;- two maps with the same bindings (where keys are compared by
`KEY.to_int`

and values by`HASHED_VALUE.polyeq`

) will always be physically equal; - functions that benefit from sharing, like
`BASE_MAP.idempotent_union`

and`BASE_MAP.idempotent_inter`

will see improved performance; - constructors are slightly slower, as they now require a hash-table lookup;
- memory usage is increased: nodes store their tags inside themselves, and a global hash-table of all built nodes must be maintained;
- hash-consed maps assume their values are immutable;
**WARNING:**when using physical equality as`HASHED_VALUE.polyeq`

, some**maps of different types may be given the same identifier**. See the end of the documentation of`HASHED_VALUE.polyeq`

for details. Note that this is the case in the default implementations`HashedValue`

and`HeterogeneousHashedValue`

.

All hash-consing functors are **generative**, since each functor call will create a new hash-table to store the created nodes. Calling a functor twice with same arguments will lead to two numbering systems for identifiers, and thus the types should not be considered compatible.

`module MakeHashconsedMap (Key : KEY) (Value : HASHED_VALUE) () : sig ... end`

Hash-consed version of `MAP`

. See Hash-consed maps and sets for the differences between hash-consed and non hash-consed maps.

`module MakeHashconsedSet (Key : KEY) () : sig ... end`

Hash-consed version of `SET`

. See Hash-consed maps and sets for the differences between hash-consed and non hash-consed sets.

```
module MakeHashconsedHeterogeneousSet
(Key : HETEROGENEOUS_KEY)
() :
sig ... end
```

Hash-consed version of `HETEROGENEOUS_SET`

. See Hash-consed maps and sets for the differences between hash-consed and non hash-consed sets.

```
module MakeHashconsedHeterogeneousMap
(Key : HETEROGENEOUS_KEY)
(Value : HETEROGENEOUS_HASHED_VALUE)
() :
sig ... end
```

Hash-consed version of `HETEROGENEOUS_MAP`

. See Hash-consed maps and sets for the differences between hash-consed and non hash-consed maps.

## Some implementations of NODE

We provide a few different implementations of `NODE`

, they can be used with the `MakeCustomMap`

, `MakeCustomSet`

, `MakeCustomHeterogeneousMap`

and `MakeCustomHeterogeneousSet`

functors.

### Basic nodes

```
module SimpleNode
(Key : sig ... end)
(Value : HETEROGENEOUS_VALUE) :
NODE
with type 'a key = 'a Key.t
and type ('key, 'map) value = ('key, 'map) Value.t
```

This module is such that `'map t = 'map view`

. This is the node used in `MakeHeterogeneousMap`

and `MakeMap`

.

```
module NodeWithId
(Key : sig ... end)
(Value : HETEROGENEOUS_VALUE) :
NODE_WITH_ID
with type 'a key = 'a Key.t
and type ('key, 'map) value = ('key, 'map) Value.t
```

Here, nodes also contain a unique id, e.g. so that they can be used as keys of maps or hash-tables.

```
module SetNode
(Key : sig ... end) :
NODE with type 'a key = 'a Key.t and type ('key, 'map) value = unit
```

An optimized representation for sets, i.e. maps to unit: we do not store a reference to unit (note that you can further optimize when you know the representation of the key). This is the node used in `MakeHeterogeneousSet`

and `MakeSet`

.

### Weak nodes

```
module WeakNode
(Key : sig ... end)
(Value : HETEROGENEOUS_VALUE) :
NODE
with type 'a key = 'a Key.t
and type ('key, 'map) value = ('key, 'map) Value.t
```

NODE used to implement weak key hashes (the key-binding pair is an Ephemeron, the reference to the key is weak, and if the key is garbage collected, the binding disappears from the map

### Hashconsed nodes

```
module HashconsedNode
(Key : HETEROGENEOUS_KEY)
(Value : HETEROGENEOUS_HASHED_VALUE)
() :
HASH_CONSED_NODE
with type 'a key = 'a Key.t
and type ('key, 'map) value = ('key, 'map) Value.t
```

Gives a unique number to each node like `NodeWithId`

, but also performs hash-consing. So two maps with the same bindings will always be physically equal. See Hash-consed maps and sets for more details on this.

```
module HashconsedSetNode
(Key : HETEROGENEOUS_KEY)
() :
HASH_CONSED_NODE
with type 'a key = 'a Key.t
and type ('key, 'map) value = unit
```

Both a `HashconsedNode`

and a `SetNode`

.