What’s in The Box?
QuantileFlow provides three main algorithms for quantile estimation:
DDSketch
DDSketch (Distributed and Deterministic Sketch) is a quantile approximation algorithm with the following properties:
Relative Error Guarantee: Configurable error bounds on quantile estimations
Mergeable: Sketches can be combined for distributed processing
Storage Options:
ContiguousStorage: Efficient array-based storage for limited bucket rangesSparseStorage: Hash-based storage for wider bucket ranges
Mapping Schemes:
LogarithmicMapping: The canonical implementation with provable error guaranteesLinearInterpolationMapping: Faster approximation using linear interpolationCubicInterpolationMapping: Memory-efficient approximation using cubic interpolation
Bucket Management:
FIXED: Fixed maximum number of bucketsUNLIMITED: No limit on number of bucketsDYNAMIC: Dynamic limit based on log(n)
MomentSketch
MomentSketch is a moment-based approach to quantile estimation:
Compact Representation: Stores only a fixed number of moments regardless of data size
Maximum Entropy Optimization: Estimates the underlying distribution with high accuracy
Mergeable: Supports combining sketches from different data sources
Compression Support: Optional arcsinh transformation for handling widely distributed values
Summary Statistics: Provides comprehensive statistics including min, max, quartiles, mean, and count
Visualization: Built-in support for plotting estimated distributions
Serialization: Supports converting sketches to/from dictionaries for storage and transmission
HDRHistogram
HDRHistogram (High Dynamic Range Histogram) is a logarithmic-bucketed histogram implementation:
Wide Value Range: Efficiently tracks values across multiple orders of magnitude
Configurable Precision: Adjustable number of buckets for different accuracy needs
Memory Efficient: Uses logarithmic bucketing to minimize memory usage
Summary Statistics: Provides comprehensive statistics including min, max, quartiles, and count
Visualization: Built-in support for plotting distributions with logarithmic scales
Serialization: Supports converting histograms to/from dictionaries for storage and transmission
Value Range Control: Configurable minimum and maximum trackable values
All algorithms are designed to be memory-efficient and accurate, making them suitable for streaming data applications where traditional approaches would require excessive memory.