API Reference

Statistics utilities for streaming numpy data.

This package provides several utilities for computing statistics of univariate or multivariate data from samples.

Unlike the corresponding numpy functions (numpy.mean, numpy.var, numpy.cov etc.) this package is desgined to work with a stream of mini-batches of samples instead of the full dataset at once. This is particularly useful for very large data sets that can not be completely stored in memory.

The package is organized in two modules.

  • statstream.exact contains exact streaming version of the corresponding numpy functions. This includes simple statistics like mean, variance, minimum, and maximum.

  • statstream.approximate contains approximate functions for more complex statistics that can not be computed from streaming data in a single pass. This includes for exmaple low rank factorisations of covariance matrices.

All functions in this package are named like the corresponding non-streaming numpy functions (if they exists) except for the prefix streaming_. For brevity also aliases with the shorter prefix s_ are provided.

What follows is the API explanation. This mostly just lists functions and their options and is intended for quickly looking up things.

If you like a more hands-on introduction, have a look at our Examples.

statstream.exact

Exact statistics for streaming data.

The statstream.exact module provides functions for statistics that can be exactly computed from streamed data.

This includes for example mean, variance, minimum, and maximum.

Below is a list of all exact functions in the module. They have aliases making them directly available from statstream.

streaming_cov(X[, steps])

Covariance matrix of a streaming dataset.

streaming_max(X[, steps])

Maximum of a streaming dataset.

streaming_mean(X[, steps])

Mean of a streaming dataset.

streaming_mean_and_cov(X[, steps])

Mean and covariance matrix of a streaming dataset.

streaming_mean_and_std(X[, steps])

Mean and standard deviation of a streaming dataset.

streaming_mean_and_var(X[, steps])

Mean and variance of a streaming dataset.

streaming_min(X[, steps])

Minimum of a streaming dataset.

streaming_std(X[, steps])

Standard deviation of a streaming dataset.

streaming_var(X[, steps])

Variance of a streaming dataset.

statstream.approximate

Approximate statistics for streaming data.

The statstream.approximate module provides functions for statistics that can not be exactly computed from streamed data.

This includes for example low rank factorisations of covariance matrices.

Below is a list of all approximate functions in the module. They have aliases making them directly available from statstream.

streaming_low_rank_autocorrelation(X, rank)

Low rank factorization of the sample autocorrelation matrix of a streaming dataset.

streaming_low_rank_cov(X, rank[, steps, ...])

Low rank factorization of the covariance matrix of a streaming dataset.

streaming_mean_and_low_rank_cov(X, rank[, ...])

Mean and a low rank factorization of the covariance matrix of a streaming dataset.