API Reference
This section provides detailed documentation for all the public classes and methods available in the QuantileFlow package.
Core Components
DDSketch
- class QuantileFlow.DDSketch(relative_accuracy: float, mapping_type: Literal['logarithmic', 'lin_interpol', 'cub_interpol'] = 'logarithmic', max_buckets: int = 2048, bucket_strategy: BucketManagementStrategy = BucketManagementStrategy.FIXED, cont_neg: bool = True)[source]
Bases:
objectDDSketch implementation for quantile approximation with relative-error guarantees.
This implementation supports different mapping schemes and storage types for optimal performance in different scenarios. It can handle both positive and negative values, and provides configurable bucket management strategies.
- Reference:
“DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees” by Charles Masson, Jee E. Rim and Homin K. Lee
- __init__(relative_accuracy: float, mapping_type: Literal['logarithmic', 'lin_interpol', 'cub_interpol'] = 'logarithmic', max_buckets: int = 2048, bucket_strategy: BucketManagementStrategy = BucketManagementStrategy.FIXED, cont_neg: bool = True)[source]
Initialize DDSketch.
- Parameters:
relative_accuracy – The relative accuracy guarantee (alpha). Must be between 0 and 1.
mapping_type – The type of mapping scheme to use: - ‘logarithmic’: Basic logarithmic mapping - ‘lin_interpol’: Linear interpolation mapping - ‘cub_interpol’: Cubic interpolation mapping
max_buckets – Maximum number of buckets per store (default 2048). If cont_neg is True, each store will have max_buckets buckets.
bucket_strategy – Strategy for managing bucket count. If FIXED, uses ContiguousStorage, otherwise uses SparseStorage.
cont_neg – Whether to handle negative values (default True).
- Raises:
ValueError – If relative_accuracy is not between 0 and 1.
- delete(value: int | float) None[source]
Delete a value from the sketch.
- Parameters:
value – The value to delete.
- Raises:
ValueError – If value is negative and cont_neg is False.
- insert(value: int | float) None[source]
Insert a value into the sketch.
- Parameters:
value – The value to insert.
- Raises:
ValueError – If value is negative and cont_neg is False.
MomentSketch
- class QuantileFlow.MomentSketch(num_moments: int = 20, compress_values: bool = False)[source]
Bases:
objectMomentSketch implementation for quantile approximation using the moment-based approach.
This implementation uses power sums, Chebyshev moment conversion, and maximum entropy optimization to estimate the probability distribution of data and compute quantiles. It supports merging sketches from distributed sources and provides accurate quantile estimates with a compact representation.
- Reference:
“Space- and Computationally-Efficient Set Similarity via Locality Sensitive Sketching” by Anshumali Shrivastava
- __init__(num_moments: int = 20, compress_values: bool = False)[source]
Initialize MomentSketch.
- Parameters:
num_moments – Number of moments to track (default 20). Higher values increase accuracy at the cost of computation.
compress_values – Whether to compress values using arcsinh transformation (default False). Useful for handling widely distributed data with extreme values.
- classmethod from_dict(data: Dict) MomentSketch[source]
Create a sketch from a dictionary.
- Parameters:
data – Dictionary representation of a sketch.
- Returns:
New MomentSketch instance.
- insert(value: int | float) None[source]
Insert a single value into the sketch.
- Parameters:
value – The value to insert.
- insert_batch(values: List[float] | ndarray) None[source]
Insert multiple values into the sketch.
- Parameters:
values – Array or list of values to insert.
- interquartile_range() float[source]
Get the interquartile range (IQR).
- Returns:
Estimated IQR (difference between 75th and 25th percentiles).
- merge(other: MomentSketch) None[source]
Merge another MomentSketch into this one.
- Parameters:
other – Another MomentSketch instance to merge.
- Raises:
ValueError – If the sketches are incompatible (different compression settings).
- percentile(p: float) float[source]
Get the p-th percentile value.
- Parameters:
p – Percentile between 0 and 100 (e.g., 75 for 75th percentile).
- Returns:
Estimated value at the requested percentile.
- Raises:
ValueError – If p is not between 0 and 100.
- plot_distribution(figsize=(10, 6))[source]
Plot the estimated probability distribution.
- Parameters:
figsize – Figure size (width, height) in inches.
- Returns:
Matplotlib figure object.
- quantile(fraction: float) float[source]
Get the value at a given quantile.
- Parameters:
fraction – Quantile fraction between 0 and 1 (e.g., 0.5 for median).
- Returns:
Estimated value at the requested quantile.
- Raises:
ValueError – If fraction is not between 0 and 1.
- quantiles(fractions: List[float]) List[float][source]
Get values at multiple quantiles.
- Parameters:
fractions – List of quantile fractions between 0 and 1.
- Returns:
List of estimated values at the requested quantiles.
- Raises:
ValueError – If any fraction is not between 0 and 1.
Mapping Classes
These classes implement different mapping strategies for DDSketch:
Storage Classes
These classes implement different storage strategies for DDSketch: