HDRHistogram

HDRHistogram (High Dynamic Range Histogram) is a logarithmic-bucketed histogram implementation designed for efficient tracking of values across multiple orders of magnitude.

Overview

HDRHistogram offers the following features:

  • Wide Value Range: Efficiently tracks values across multiple orders of magnitude

  • Configurable Precision: Adjustable number of buckets for different accuracy needs

  • Memory Efficient: Uses logarithmic bucketing to minimize memory usage

  • Summary Statistics: Provides comprehensive statistics including min, max, quartiles, and count

  • Visualization: Built-in support for plotting distributions with logarithmic scales

  • Serialization: Supports converting histograms to/from dictionaries for storage and transmission

  • Value Range Control: Configurable minimum and maximum trackable values

Main Class

class QuantileFlow.HDRHistogram(num_buckets: int = 8, min_value: float = 1.0, max_value: float = inf)[source]

Bases: object

High Dynamic Range Histogram implementation for quantile approximation.

Provides efficient tracking of values across a wide range using logarithmic bucketing, with methods for inserting values, computing quantiles, and generating summary statistics.

__init__(num_buckets: int = 8, min_value: float = 1.0, max_value: float = inf)[source]

Initialize HDR Histogram.

Parameters:
  • num_buckets – Number of logarithmic buckets (default 8)

  • min_value – Minimum trackable value (default 1.0)

  • max_value – Maximum trackable value (default infinity)

classmethod from_dict(data: Dict) HDRHistogram[source]

Create a histogram from a dictionary.

Parameters:

data – Dictionary representation of a histogram.

Returns:

New HDRHistogram instance.

insert(value: int | float) None[source]

Insert a single value into the histogram.

Parameters:

value – The value to insert.

insert_batch(values: List[float] | ndarray) None[source]

Insert multiple values into the histogram.

Parameters:

values – Array or list of values to insert.

interquartile_range() float[source]

Get the interquartile range (IQR).

Returns:

Estimated IQR (difference between 75th and 25th percentiles).

median() float[source]

Get the median value (50th percentile).

Returns:

Estimated median value.

percentile(p: float) float[source]

Get the p-th percentile value.

Parameters:

p – Percentile between 0 and 100 (e.g., 75 for 75th percentile).

Returns:

Estimated value at the requested percentile.

plot_distribution(figsize=(10, 6))[source]

Plot the estimated probability distribution.

Parameters:

figsize – Figure size (width, height) in inches.

Returns:

Matplotlib figure object.

quantile(fraction: float) float[source]

Get the value at a given quantile.

Parameters:

fraction – Quantile fraction between 0 and 1 (e.g., 0.5 for median).

Returns:

Estimated value at the requested quantile.

quantiles(fractions: List[float]) List[float][source]

Get values at multiple quantiles.

Parameters:

fractions – List of quantile fractions between 0 and 1.

Returns:

List of estimated values at the requested quantiles.

summary_statistics() Dict[str, float][source]

Get summary statistics.

Returns:

Dictionary containing min, q1, median, q3, max, and count.

to_dict() Dict[source]

Convert histogram to a dictionary for serialization.

Returns:

Dictionary representation of the histogram.

Mathematical Background

HDRHistogram works by maintaining a fixed number of logarithmic buckets to efficiently track values across a wide range. The key aspects of the implementation are:

  1. Logarithmic Bucketing: Values are assigned to buckets based on their logarithm, allowing efficient tracking of values across multiple orders of magnitude

  2. Value Range Control: Configurable minimum and maximum values ensure memory efficiency

  3. Linear Interpolation: Within each bucket, linear interpolation is used to estimate quantiles

Bucket Calculation

The bucket index for a value \(x\) is calculated as:

\[ \text{bucket_index} = \lfloor \log_2(x) \rfloor \]

This ensures that:

  • Values in the same order of magnitude are grouped together

  • The number of buckets grows logarithmically with the value range

  • Memory usage remains constant regardless of the number of values inserted

Quantile Estimation

Quantiles are estimated by:

  1. Finding the bucket containing the target quantile

  2. Using linear interpolation within the bucket to estimate the exact value

For a target quantile \(q\) and total count \(N\), the estimated value is:

\[ \text{value} = \text{lower_bound} + (\text{upper_bound} - \text{lower_bound}) \times \frac{qN - \text{cumulative_count}}{\text{bucket_count}} \]

Where:

  • \(\text{lower_bound}\) and \(\text{upper_bound}\) are the bucket boundaries

  • \(\text{cumulative_count}\) is the count of values in previous buckets

  • \(\text{bucket_count}\) is the count of values in the current bucket

Performance Characteristics

  • Memory Usage: O(num_buckets)

  • Insertion Time: O(1) per value

  • Query Time: O(num_buckets) for quantile queries

  • Merge Time: O(num_buckets) for merging histograms

Use Cases

HDRHistogram is particularly useful for:

  • Tracking metrics with wide value ranges (e.g., response times, request sizes)

  • Monitoring systems with multiple orders of magnitude in measurements

  • Applications requiring configurable precision and value range control

  • Distributed systems where histograms need to be merged

  • Real-time monitoring and alerting systems

Usage Examples

For basic usage, see the Getting Started guide.

For more advanced examples, see the Examples page.