Module Stats

Compute various statistical indicators for a collection of values

type exhaustive = |
type compact = |
type ('a, 'kind) t

Stats for a collection of type 'a, either int or float.

If 'kind is exhaustive, then this contains all values, and can compute the median, q1 and q3 (computing these requires sorting, but it is only done once, however, adding new values will require resorting).

Otherwise, when 'kind is compact, this is a constant sized aggregate. It does not store all the values, only the minimal required to compute some stats (min, max, sum, sum of squares). It can't compute the median or quartiles.

Constuctors

val make_float : (int, 'kind) t -> (float, 'kind) t

Convert an int collection to a float collection

val make_compact : ('a, exhaustive) t -> ('a, compact) t

Convert an exhaustive collection into a compact representation

Compact values

While empty constructors are provided, accessing stats of empty will raise CollectionTooShort. List and array constructors are linear in the size of the given list/array.

val compact_int_empty : (int, compact) t
val compact_float_empty : (float, compact) t
val compact_int_singleton : int -> (int, compact) t
val compact_float_singleton : float -> (float, compact) t
val compact_of_int_list : int list -> (int, compact) t
val compact_of_float_list : float list -> (float, compact) t
val compact_of_int_array : int array -> (int, compact) t
val compact_of_float_array : float array -> (float, compact) t

Exhaustive values

val exhaustive_int_empty : (int, exhaustive) t
val exhaustive_float_empty : (float, exhaustive) t
val exhaustive_int_singleton : int -> (int, exhaustive) t
val exhaustive_float_singleton : float -> (float, exhaustive) t
val exhaustive_of_int_list : int list -> (int, exhaustive) t
val exhaustive_of_float_list : float list -> (float, exhaustive) t
val exhaustive_of_int_array : int array -> (int, exhaustive) t
val exhaustive_of_float_array : float array -> (float, exhaustive) t

Adding values

val add_value : ('a, 'kind) t -> 'a -> ('a, 'kind) t

Add a new value to the collection. Constant time operation.

val concat : ('a, 'kind) t -> ('a, 'kind) t -> ('a, 'kind) t

Concatenate both collections. Constant time operation.

val add_list : ('a, 'kind) t -> 'a list -> ('a, 'kind) t

Add all values in the list, equivalent to List.fold_left add_value, linear in the size of the list for compact values, O(n log n) (in size of list+collection) for exhaustive

val add_array : ('a, 'kind) t -> 'a array -> ('a, 'kind) t

Add all values in the array, equivalent to Array.fold_left add_value, linear in the size of the array for compact values, O(n log n) (in size of array+collection) for exhaustive

Accessors

All accessors are constant time operations, except median, q1 and q3, which need to sort the collection. Sorting is only done once and then saved, so getting the q1 after computing the median is constant time.

val size : ('a, 'kind) t -> int

The size of the collection, i.e. the number of elements

exception CollectionTooShort

Exception raised when attempting to access the stats (sum, min, average, ...) of an empty collection (whose size is 0), or attempting to access q1 or q3 of a collection whose size is smaller than 4.

val sum : ('a, 'kind) t -> 'a

Sum of all items in the collection: \sum_i x_i

val sum_squares : ('a, 'kind) t -> 'a

The sum of the squares of the collection: \sum_i x_i^2. May raise Z.Overflow.

val min : ('a, 'kind) t -> 'a

The minimal element

val max : ('a, 'kind) t -> 'a

The maximal element

val range : ('a, 'kind) t -> 'a

The range, i.e. max - min.

val average : ('a, 'kind) t -> float

The average/mean value: i.e. sum collection / size collection.

val variance : ('a, 'kind) t -> float

The variance: i.e. sum of the squares of the difference with the average \sum_i (x_i - \mu)^2

val standard_deviation : ('a, 'kind) t -> float

The square root of the variance.

val median : ('a, exhaustive) t -> float

The median, or 2nd quartile

val q1 : ('a, exhaustive) t -> float

The first quartile, requires size >= 4

val q3 : ('a, exhaustive) t -> float

The third quartile, requires size >= 4

Export values

Export the list/array of values, sorted in increasing order. If unsorted, these will sort the collection (O(n log n)), else they will copy it O(n).

val to_list : ('a, exhaustive) t -> 'a list
val to_array : ('a, exhaustive) t -> 'a array

Pretty printers

Both of these take an extra unit parameter to mark the end of the optional arguments.

val pp_percent : ?justify:bool -> ?precision:int -> unit -> Stdlib.Format.formatter -> (int * int) -> unit

pp_percent () fmt (num, denom) prints the ratio num / denom as a percentage, including a final "%" symbol. Rounds fractions, so "20.99%" is printed as "21.0%" when precision is 1.

  • parameter justify

    (default: false), when true, add spaces left of the number so that they all take the same space (print " 20.0%" instead of "20.0%")

  • parameter precision

    (default: 1) number of digits to print. 0 -> "20%" | 1 -> "20.0%" | 2 -> "20.00%", etc...

val unit_prefixes : string list

Standard SI unit prefix list: ""; "k"; "M"; "G"; "T"; "P"; "E"; "Z"; "Y"; "R"; "Q".

val pp_with_unit : ?justify:bool -> ?unit_prefixes:string list -> ?separator:string -> ?base:int -> unit -> Stdlib.Format.formatter -> int -> unit

pp_with_unit () fmt nb prints the number nb with at most three digits using the specified unit prefixes. For example:

  • pp_unit fmt 123 -> "123"
  • pp_unit fmt 12345 -> "12.3k"
  • pp_unit fmt 123456789 -> "123M"
  • parameter justify

    (default: false), left-pad so all numbers have the same widths

  • parameter unit_prefixes

    (default: unit_prefixes) the prefix letters, increment each base step

  • parameter separator

    printed between number and unit, default is empty string

  • parameter base

    (default: 1000), the scale between unit increments, 1000 or 1024

Multi-session loggers

module StatLogger (S : sig ... end) () : sig ... end

Save stats between mutliple codex runs. Each logger saves a mapping string -> stat between various runs.