DDSketch

The DDSketch (Distributed and Deterministic Sketch) is a quantile sketch algorithm that provides accuracy guarantees for quantile estimation.

Overview

DDSketch offers the following features:

  • Relative Error Guarantee: Estimates quantiles with a relative error that is at most the configured accuracy parameter

  • Mergeable: Multiple sketches can be combined without loss of accuracy

  • Memory Efficient: Uses buckets to approximate the distribution, requiring less memory than storing all data points

  • Fast Operations: Provides O(1) time complexity for adding values and querying quantiles

Main Class

class QuantileFlow.DDSketch(relative_accuracy: float, mapping_type: Literal['logarithmic', 'lin_interpol', 'cub_interpol'] = 'logarithmic', max_buckets: int = 2048, bucket_strategy: BucketManagementStrategy = BucketManagementStrategy.FIXED, cont_neg: bool = True)[source]

Bases: object

DDSketch implementation for quantile approximation with relative-error guarantees.

This implementation supports different mapping schemes and storage types for optimal performance in different scenarios. It can handle both positive and negative values, and provides configurable bucket management strategies.

Reference:

“DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees” by Charles Masson, Jee E. Rim and Homin K. Lee

__init__(relative_accuracy: float, mapping_type: Literal['logarithmic', 'lin_interpol', 'cub_interpol'] = 'logarithmic', max_buckets: int = 2048, bucket_strategy: BucketManagementStrategy = BucketManagementStrategy.FIXED, cont_neg: bool = True)[source]

Initialize DDSketch.

Parameters:
  • relative_accuracy – The relative accuracy guarantee (alpha). Must be between 0 and 1.

  • mapping_type – The type of mapping scheme to use: - ‘logarithmic’: Basic logarithmic mapping - ‘lin_interpol’: Linear interpolation mapping - ‘cub_interpol’: Cubic interpolation mapping

  • max_buckets – Maximum number of buckets per store (default 2048). If cont_neg is True, each store will have max_buckets buckets.

  • bucket_strategy – Strategy for managing bucket count. If FIXED, uses ContiguousStorage, otherwise uses SparseStorage.

  • cont_neg – Whether to handle negative values (default True).

Raises:

ValueError – If relative_accuracy is not between 0 and 1.

delete(value: int | float) None[source]

Delete a value from the sketch.

Parameters:

value – The value to delete.

Raises:

ValueError – If value is negative and cont_neg is False.

insert(value: int | float) None[source]

Insert a value into the sketch.

Parameters:

value – The value to insert.

Raises:

ValueError – If value is negative and cont_neg is False.

merge(other: DDSketch) None[source]

Merge another DDSketch into this one.

Parameters:

other – Another DDSketch instance to merge with this one.

Raises:

ValueError – If the sketches are incompatible.

quantile(q: float) float[source]

Compute the approximate quantile.

Parameters:

q – The desired quantile (between 0 and 1).

Returns:

The approximate value at the specified quantile.

Raises:

ValueError – If q is not between 0 and 1 or if the sketch is empty.

Mapping Types

DDSketch supports different mapping strategies that translate input values to bucket indices:

Logarithmic Mapping

The canonical implementation with provable error guarantees:

Linear Interpolation Mapping

An approximation using linear interpolation for faster computation:

Cubic Interpolation Mapping

An approximation using cubic interpolation for better memory efficiency:

Storage Types

DDSketch provides two storage implementations for different use cases:

Contiguous Storage

An array-based storage optimized for limited bucket ranges:

Sparse Storage

A hash-based storage for handling wider bucket ranges:

Bucket Management

Usage Examples

For basic usage, see the Getting Started guide.

For more advanced examples, see the Examples page.