DDSketch
The DDSketch (Distributed and Deterministic Sketch) is a quantile sketch algorithm that provides accuracy guarantees for quantile estimation.
Overview
DDSketch offers the following features:
Relative Error Guarantee: Estimates quantiles with a relative error that is at most the configured accuracy parameter
Mergeable: Multiple sketches can be combined without loss of accuracy
Memory Efficient: Uses buckets to approximate the distribution, requiring less memory than storing all data points
Fast Operations: Provides O(1) time complexity for adding values and querying quantiles
Main Class
- class QuantileFlow.DDSketch(relative_accuracy: float, mapping_type: Literal['logarithmic', 'lin_interpol', 'cub_interpol'] = 'logarithmic', max_buckets: int = 2048, bucket_strategy: BucketManagementStrategy = BucketManagementStrategy.FIXED, cont_neg: bool = True)[source]
Bases:
objectDDSketch implementation for quantile approximation with relative-error guarantees.
This implementation supports different mapping schemes and storage types for optimal performance in different scenarios. It can handle both positive and negative values, and provides configurable bucket management strategies.
- Reference:
“DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees” by Charles Masson, Jee E. Rim and Homin K. Lee
- __init__(relative_accuracy: float, mapping_type: Literal['logarithmic', 'lin_interpol', 'cub_interpol'] = 'logarithmic', max_buckets: int = 2048, bucket_strategy: BucketManagementStrategy = BucketManagementStrategy.FIXED, cont_neg: bool = True)[source]
Initialize DDSketch.
- Parameters:
relative_accuracy – The relative accuracy guarantee (alpha). Must be between 0 and 1.
mapping_type – The type of mapping scheme to use: - ‘logarithmic’: Basic logarithmic mapping - ‘lin_interpol’: Linear interpolation mapping - ‘cub_interpol’: Cubic interpolation mapping
max_buckets – Maximum number of buckets per store (default 2048). If cont_neg is True, each store will have max_buckets buckets.
bucket_strategy – Strategy for managing bucket count. If FIXED, uses ContiguousStorage, otherwise uses SparseStorage.
cont_neg – Whether to handle negative values (default True).
- Raises:
ValueError – If relative_accuracy is not between 0 and 1.
- delete(value: int | float) None[source]
Delete a value from the sketch.
- Parameters:
value – The value to delete.
- Raises:
ValueError – If value is negative and cont_neg is False.
- insert(value: int | float) None[source]
Insert a value into the sketch.
- Parameters:
value – The value to insert.
- Raises:
ValueError – If value is negative and cont_neg is False.
- merge(other: DDSketch) None[source]
Merge another DDSketch into this one.
- Parameters:
other – Another DDSketch instance to merge with this one.
- Raises:
ValueError – If the sketches are incompatible.
- quantile(q: float) float[source]
Compute the approximate quantile.
- Parameters:
q – The desired quantile (between 0 and 1).
- Returns:
The approximate value at the specified quantile.
- Raises:
ValueError – If q is not between 0 and 1 or if the sketch is empty.
Mapping Types
DDSketch supports different mapping strategies that translate input values to bucket indices:
Logarithmic Mapping
The canonical implementation with provable error guarantees:
Linear Interpolation Mapping
An approximation using linear interpolation for faster computation:
Cubic Interpolation Mapping
An approximation using cubic interpolation for better memory efficiency:
Storage Types
DDSketch provides two storage implementations for different use cases:
Contiguous Storage
An array-based storage optimized for limited bucket ranges:
Sparse Storage
A hash-based storage for handling wider bucket ranges:
Bucket Management
Usage Examples
For basic usage, see the Getting Started guide.
For more advanced examples, see the Examples page.