This is documentation for version v0.9.0 of patricia-tree, but the latest version is v0.10.0.
Click here to redirect to the latest version.

Patricia Tree API - PatriciaTree

Association maps from key to values, and sets, implemented with Patricia Trees, allowing fast merge operations by making use of physical equality between subtrees; and custom implementation of tree nodes (allowing normal maps, hash-consed maps, weak key or value maps, sets, custom maps, etc.)

This is similar to OCaml's Map, except that:

  • The required signature for keys is different, in that we require each key to be mapped to a unique integer identifier.
  • The implementation uses Patricia Tree, as described in Oksasaki and Gill's 1998 paper "Fast mergeable integer maps", i.e. it is a space-efficient prefix trie over the big-endian representation of the key's integer identifier.

The main benefit of Patricia Tree is that their representation is stable (contrary to maps, inserting nodes in any order will return the same shape), which allows different versions of a map to share more subtrees in memory, and the operations over two maps to benefit from this sharing. The functions in this library attempt to maximally preserve sharing and benefit from sharing, allowing very important improvements in complexity and running time when combining maps or sets is a frequent operation.

  • Finally, the implementation is more customizable, allowing notably (key,value) pairs or different types to be in the same map, or to choose the memory representation of the nodes of the tree.
  • Some operations like pop_minimum and pop_maximum make our Set suitable as priority queue (but remember that each element in the queue must map to a distinct integer).

Note on complexity: in the following, n represents the size of the map when there is one (and |map1| is the number of elements in map1). The term log(n) correspond to the maximum height of the tree, which is log(n) if we assume an even distribution of numbers in the map (e.g. random distribution, or integers chosen contiguously using a counter). The worst-case height is O(max(n,64)) which is actually constant, but not really informative; log(n) corresponds to the real complexity in usual distributions.

type intkey
type mask

Nodes

module type NODE = sig ... end

This module explains how a node is stored in memory, with functions to create and view nodes.

module type NODE_WITH_ID = sig ... end

Associate a unique number to each node.

Map signatures

Base map

module type BASE_MAP = sig ... end

Base map signature: a generic 'b map storing bindings of 'a key to ('a,'b) values. All maps and set are a variation of this type, sometimes with a simplified interface:

Heterogeneous maps and sets

Maps and sets with generic keys 'a key and values ('a,'b) value

module type HETEROGENEOUS_MAP = sig ... end

This is the same as MAP, but with simple type key being replaced by type constructor 'a key and 'b value being replaced by ('a,'b) value.

module type HETEROGENEOUS_SET = sig ... end

A set containing different keys, very similar to SET, but with simple type elt being replaced by type constructor 'a elt.

Homogeneous maps and sets

Same as above, but simple interfaces for non-generic keys

module type SET = sig ... end

Signature for sets implemented using Patricia trees. Most of this interface should be shared with Stdlib.Set.S.

type (_, 'b) snd =
  1. | Snd of 'b

The typechecker struggles with forall quantification on values if they don't depend on the first parameter, this wrapping allows our code to pass typechecking by forbidding overly eager simplification.

This is due to a bug in the typechecker, more info on the OCaml discourse post.

module type MAP = sig ... end

The signature for maps with a single type for keys and values. Most of this interface should be shared with Stdlib.Set.S.

Keys

Keys are the functor arguments used to build the maps.

module type KEY = sig ... end

The signature of keys when they are all of the same type.

type (_, _) cmp =
  1. | Eq : ('a, 'a) cmp
  2. | Diff : ('a, 'b) cmp

To have heterogeneous keys, we must define a polymorphic equality function. Like in the homogeneous case, it should have the requirement that (to_int a) = (to_int b) ==> polyeq a b = Eq.

module type HETEROGENEOUS_KEY = sig ... end

The signature of heterogeneous keys.

module type VALUE = sig ... end

The moodule type of values, which can be heterogeneous.

module HomogeneousValue : VALUE with type ('a, 'map) t = 'map

To use when the type of the value is the same (but the keys can still be heterogeneous).

module WrappedHomogeneousValue : VALUE with type ('a, 'map) t = ('a, 'map) snd

Functors

Homogeneous maps and sets

module MakeMap (Key : KEY) : MAP with type key = Key.t
module MakeSet (Key : KEY) : SET with type elt = Key.t

Heterogeneous maps and sets

A set containing different keys, very similar to SET, but with simple type elt being replaced by type constructor 'a elt.

module MakeHeterogeneousMap (Key : HETEROGENEOUS_KEY) (Value : VALUE) : HETEROGENEOUS_MAP with type 'a key = 'a Key.t and type ('k, 'm) value = ('k, 'm) Value.t

This is the same as MAP, but with simple type key being replaced by type constructor 'a key and 'b value being replaced by ('a,'b) value.

Maps with custom representation of Nodes

We can also customize the representation and creation of nodes, to gain space or time.

Possibitities include having weak key and/or values, hash-consing, giving unique number to nodes or keeping them in sync with the disk, lazy evaluation and/or caching, etc.

module MakeCustom (Key : KEY) (NODE : NODE with type 'a key = Key.t and type ('key, 'map) value = ('key, 'map) snd) : MAP with type key = Key.t and type 'm t = 'm NODE.t

Create a Homogeneous Map with a custom NODE.

module MakeCustomHeterogeneous (Key : HETEROGENEOUS_KEY) (Value : VALUE) (NODE : NODE with type 'a key = 'a Key.t and type ('key, 'map) value = ('key, 'map) Value.t) : HETEROGENEOUS_MAP with type 'a key = 'a Key.t and type ('k, 'm) value = ('k, 'm) Value.t and type 'm t = 'm NODE.t

Create an Heterogeneous map with a custom NODE.

Some implementations of NODE

module SimpleNode (Key : sig ... end) (Value : VALUE) : NODE with type 'a key = 'a Key.t and type ('key, 'map) value = ('key, 'map) Value.t

This module is such that 'map t = 'map view.

module NodeWithId (Key : sig ... end) (Value : VALUE) : NODE_WITH_ID with type 'a key = 'a Key.t and type ('key, 'map) value = ('key, 'map) Value.t

Here, nodes also contain a unique id, e.g. so that they can be used as keys of maps or hashtables.

module SetNode (Key : sig ... end) : NODE with type 'a key = 'a Key.t and type ('key, 'map) value = unit

An optimized representation for sets, i.e. maps to unit: we do not store a reference to unit (note that you can further optimize when you know the representation of the key).

module WeakNode (Key : sig ... end) (Value : VALUE) : NODE with type 'a key = 'a Key.t and type ('key, 'map) value = ('key, 'map) Value.t

NODE used to implement weak key hashes (the key-binding pair is an Ephemeron, the reference to the key is weak, and if the key is garbage collected, the binding disappears from the map

module WeakSetNode (Key : sig ... end) : NODE with type 'a key = 'a Key.t and type ('key, 'map) value = unit

Both a WeakNode and a SetNode, useful to implement Weak sets.