-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathdescription.text
More file actions
28 lines (25 loc) · 1.65 KB
/
Copy pathdescription.text
File metadata and controls
28 lines (25 loc) · 1.65 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Statistics for Common Lisp provides two complementary subsystems for
statistical computing, modeled on the separation found in Julia's Statistics.jl
and OnlineStats.jl.
STATISTICS/BATCH is the everyday interface, analogous to R's built-in
stats package or Python's NumPy/SciPy summary functions. Pass any
Common Lisp sequence, vector or list, to MEAN, VARIANCE,
STANDARD-DEVIATION, QUANTILE, MEDIAN, or MODE. Weighted variants
accept Julia-style weight objects (FREQUENCY-WEIGHTS,
ANALYTIC-WEIGHTS, PROBABILITY-WEIGHTS) that carry the appropriate bias
correction automatically. Higher-level summaries FIVENUM and SCALE
work analogously to R's fivenum() and scale().
STATISTICS/STREAMING is for data that cannot or should not be held in memory
all at once: streaming feeds, distributed computation, or incremental updates.
Rather than calling a function on a complete dataset, you create an accumulator,
feed observations into it one at a time with ADD, and extract results at any
point. Accumulators for multiple independent partitions can be combined exactly
with POOL, enabling map-reduce style parallelism with no approximation error.
This mirrors the workflow of Julia's OnlineStats.jl or Python's
statsmodels.stats.running.
Both subsystems share a common generic function interface defined in the
STAT-GENERICS system, so MEAN on a sequence and MEAN on an accumulator are the same
generic function dispatching to different implementations. Users loading the
umbrella STATISTICS system get both subsystems and can move between them freely:
exploratory work on in-memory data uses batch, production pipelines over streams
use online, and the same function names work throughout.