statstream
: Statistics for Streaming Data¶
Release v22.2.0.dev (What's new?).
statstream
is a lightweight Python package providing data analysis and statistics utilities for streaming data.
Its main goal is to provide single-pass variants of conventional numpy
data analysis and statistics functionality for streaming data that is
either generated on the fly or to large to be handled at once. Data can be
streamed as in chunks called mini-batches, which makes statstream
extremely useful in combination with machine learning and deep learning
packages like keras, tensorflow, or pytorch.
Getting Started¶
statstream
is a Python-only package hosted on PyPI.
The recommended installation method is pip-installing
into a virtual environment.
$ pip install statstream
The next three steps should bring you up and running in no time:
The Overview section will show you a simple example of
statstream
in action and introduce you to its core ideas.The Examples section will give you a comprehensive tour of
statstream
’s features. After reading, you will know about our advanced features and how to use them.The API Reference reference is a quick way to look up details of all features and their options.
If at any point you get confused by some terminology, please check out our Glossary.
Project Information¶
statstream
is released under the MIT license,
its documentation lives at Read the Docs,
the code on GitHub,
and the latest release can be found on PyPI.
It’s tested on Python 2.7 and 3.5+.
If you’d like to contribute to statstream
you’re most welcome.
We have written a short guide to help you get you started!
Further Reading¶
Additional information on the algorithmic aspects of statstream
can be found
in the following works:
Tony F. Chan & Gene H. Golub & Randall J. LeVeque, “Updating formulae and a pairwise algorithm for computing sample variances”, 1979
Radim, Rehurek, “Scalability of Semantic Analysis in Natural Language Processing”, 2011