HDRHistogram
HDRHistogram (High Dynamic Range Histogram) is a logarithmic-bucketed histogram implementation designed for efficient tracking of values across multiple orders of magnitude.
Overview
HDRHistogram offers the following features:
Wide Value Range: Efficiently tracks values across multiple orders of magnitude
Configurable Precision: Adjustable number of buckets for different accuracy needs
Memory Efficient: Uses logarithmic bucketing to minimize memory usage
Summary Statistics: Provides comprehensive statistics including min, max, quartiles, and count
Visualization: Built-in support for plotting distributions with logarithmic scales
Serialization: Supports converting histograms to/from dictionaries for storage and transmission
Value Range Control: Configurable minimum and maximum trackable values
Main Class
- class QuantileFlow.HDRHistogram(num_buckets: int = 8, min_value: float = 1.0, max_value: float = inf)[source]
Bases:
objectHigh Dynamic Range Histogram implementation for quantile approximation.
Provides efficient tracking of values across a wide range using logarithmic bucketing, with methods for inserting values, computing quantiles, and generating summary statistics.
- __init__(num_buckets: int = 8, min_value: float = 1.0, max_value: float = inf)[source]
Initialize HDR Histogram.
- Parameters:
num_buckets – Number of logarithmic buckets (default 8)
min_value – Minimum trackable value (default 1.0)
max_value – Maximum trackable value (default infinity)
- classmethod from_dict(data: Dict) HDRHistogram[source]
Create a histogram from a dictionary.
- Parameters:
data – Dictionary representation of a histogram.
- Returns:
New HDRHistogram instance.
- insert(value: int | float) None[source]
Insert a single value into the histogram.
- Parameters:
value – The value to insert.
- insert_batch(values: List[float] | ndarray) None[source]
Insert multiple values into the histogram.
- Parameters:
values – Array or list of values to insert.
- interquartile_range() float[source]
Get the interquartile range (IQR).
- Returns:
Estimated IQR (difference between 75th and 25th percentiles).
- median() float[source]
Get the median value (50th percentile).
- Returns:
Estimated median value.
- percentile(p: float) float[source]
Get the p-th percentile value.
- Parameters:
p – Percentile between 0 and 100 (e.g., 75 for 75th percentile).
- Returns:
Estimated value at the requested percentile.
- plot_distribution(figsize=(10, 6))[source]
Plot the estimated probability distribution.
- Parameters:
figsize – Figure size (width, height) in inches.
- Returns:
Matplotlib figure object.
- quantile(fraction: float) float[source]
Get the value at a given quantile.
- Parameters:
fraction – Quantile fraction between 0 and 1 (e.g., 0.5 for median).
- Returns:
Estimated value at the requested quantile.
- quantiles(fractions: List[float]) List[float][source]
Get values at multiple quantiles.
- Parameters:
fractions – List of quantile fractions between 0 and 1.
- Returns:
List of estimated values at the requested quantiles.
- summary_statistics() Dict[str, float][source]
Get summary statistics.
- Returns:
Dictionary containing min, q1, median, q3, max, and count.
- to_dict() Dict[source]
Convert histogram to a dictionary for serialization.
- Returns:
Dictionary representation of the histogram.
Mathematical Background
HDRHistogram works by maintaining a fixed number of logarithmic buckets to efficiently track values across a wide range. The key aspects of the implementation are:
Logarithmic Bucketing: Values are assigned to buckets based on their logarithm, allowing efficient tracking of values across multiple orders of magnitude
Value Range Control: Configurable minimum and maximum values ensure memory efficiency
Linear Interpolation: Within each bucket, linear interpolation is used to estimate quantiles
Bucket Calculation
The bucket index for a value \(x\) is calculated as:
This ensures that:
Values in the same order of magnitude are grouped together
The number of buckets grows logarithmically with the value range
Memory usage remains constant regardless of the number of values inserted
Quantile Estimation
Quantiles are estimated by:
Finding the bucket containing the target quantile
Using linear interpolation within the bucket to estimate the exact value
For a target quantile \(q\) and total count \(N\), the estimated value is:
Where:
\(\text{lower_bound}\) and \(\text{upper_bound}\) are the bucket boundaries
\(\text{cumulative_count}\) is the count of values in previous buckets
\(\text{bucket_count}\) is the count of values in the current bucket
Performance Characteristics
Memory Usage: O(num_buckets)
Insertion Time: O(1) per value
Query Time: O(num_buckets) for quantile queries
Merge Time: O(num_buckets) for merging histograms
Use Cases
HDRHistogram is particularly useful for:
Tracking metrics with wide value ranges (e.g., response times, request sizes)
Monitoring systems with multiple orders of magnitude in measurements
Applications requiring configurable precision and value range control
Distributed systems where histograms need to be merged
Real-time monitoring and alerting systems
Usage Examples
For basic usage, see the Getting Started guide.
For more advanced examples, see the Examples page.