statstream.exact.streaming_std

statstream.exact.streaming_std(X, steps=None)

Standard deviation of a streaming dataset.

Computes the standard deviation of a dataset from a stream of batches of samples. The data has to be provided by an iterator yielding batches of samples. Either a number of steps can be specified, or the iterator is assumed to be emptied in a finite number of steps. In the first case only the given number of batches is extracted from the iterator and used for the standard deviation calculation, even if the iterator could yield more data.

Samples are given along the first axis. The standard deviation has the same shape as the remaining axes, e.g. batches of shape [batch_size, d1, ..., dN] will produce a mean and variance of shape [d1, ..., dN].

This function consumes an iterator, thus finite iterators will be empty after a call to this function, unless steps is set to a smaller number than batches in the iterator.

Parameters:
Xiterable

An iterator yielding batches of samples.

stepsint, optional

The number of batches to use from the iterator (all available batches are used if set to None). The defaul is None.

Returns:
array

The standard deviation of the seen data samples.

See also

streaming_mean_and_std

get mean and standard deviation in a single pass.

Notes

Computing standard deviation necessarily includes computing the mean, so there is no computational benefit of using streaming_std over using streaming_mean_and_std. This function does nothing else than taking the square root of streaming_var.

The streamed variances are calculated as described in [1].

References

[1]

Tony F. Chan & Gene H. Golub & Randall J. LeVeque, “Updating formulae and a pairwise algorithm for computing sample variances”, 1979.