statistics/description.text at master · Lisp-Stat/statistics · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Statistics for Common Lisp provides two complementary subsystems for
statistical computing, modeled on the separation found in Julia's Statistics.jl
and OnlineStats.jl.

STATISTICS/BATCH is the everyday interface, analogous to R's built-in
stats package or Python's NumPy/SciPy summary functions.  Pass any
Common Lisp sequence, vector or list, to MEAN, VARIANCE,
STANDARD-DEVIATION, QUANTILE, MEDIAN, or MODE.  Weighted variants
accept Julia-style weight objects (FREQUENCY-WEIGHTS,
ANALYTIC-WEIGHTS, PROBABILITY-WEIGHTS) that carry the appropriate bias
correction automatically.  Higher-level summaries FIVENUM and SCALE
work analogously to R's fivenum() and scale().

STATISTICS/STREAMING is for data that cannot or should not be held in memory
all at once: streaming feeds, distributed computation, or incremental updates.
Rather than calling a function on a complete dataset, you create an accumulator,
feed observations into it one at a time with ADD, and extract results at any
point.  Accumulators for multiple independent partitions can be combined exactly
with POOL, enabling map-reduce style parallelism with no approximation error.
This mirrors the workflow of Julia's OnlineStats.jl or Python's
statsmodels.stats.running.

Both subsystems share a common generic function interface defined in the
STAT-GENERICS system, so MEAN on a sequence and MEAN on an accumulator are the same
generic function dispatching to different implementations.  Users loading the
umbrella STATISTICS system get both subsystems and can move between them freely:
exploratory work on in-memory data uses batch, production pipelines over streams
use online, and the same function names work throughout.