org.elasticsearch.search.aggregations.metrics.AbstractHyperLogLogPlusPlus

org.elasticsearch.search.aggregations.metrics.HyperLogLogPlusPlus

All Implemented Interfaces:: Closeable, AutoCloseable, org.elasticsearch.core.Releasable

public final class HyperLogLogPlusPlus extends AbstractHyperLogLogPlusPlus

Hyperloglog++ counter, implemented based on pseudo code from http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf and its appendix https://docs.google.com/document/d/1gyjfMHy43U9OWBXxfaeG-3MjGzejW1dlpyMwEYAAWEI/view?fullscreen This implementation is different from the original implementation in that it uses a hash table instead of a sorted list for linear counting. Although this requires more space and makes hyperloglog (which is less accurate) used sooner, this is also considerably faster. Trying to understand what this class does without having read the paper is considered adventurous. The HyperLogLogPlusPlus contains two algorithms, one for linear counting and the HyperLogLog algorithm. Initially hashes added to the data structure are processed using the linear counting until a threshold defined by the precision is reached where the data is replayed to the HyperLogLog algorithm and then this is used. It supports storing several HyperLogLogPlusPlus structures which are identified by a bucket number.

Field Summary

Fields

Modifier and Type

Field

Description

static int

DEFAULT_PRECISION

static int

MAX_PRECISION

static int

MIN_PRECISION

protected int

p

Fields inherited from class org.elasticsearch.search.aggregations.metrics.AbstractHyperLogLogPlusPlus
HYPERLOGLOG, LINEAR_COUNTING
Constructor Summary

Constructors

Constructor

Description

HyperLogLogPlusPlus(int precision, BigArrays bigArrays, long initialBucketCount)
Method Summary

Modifier and Type

Method

Description

protected void

addRunLen(long bucketOrd, int register, int runLen)

long

cardinality(long bucketOrd)

Returns the current computed cardinality

void

close()

void

collect(long bucket, long hash)

Collect a value in the given bucket

protected boolean

getAlgorithm(long bucketOrd)

Algorithm used in the given bucket

protected AbstractHyperLogLog.RunLenIterator

getHyperLogLog(long bucketOrd)

Get HyperLogLog algorithm

protected AbstractLinearCounting.HashesIterator

getLinearCounting(long bucketOrd)

Get linear counting algorithm

long

maxOrd()

Get the number of data structures

static long

memoryUsage(int precision)

Return the expected per-bucket memory usage for the given precision.

void

merge(long thisBucket, AbstractHyperLogLogPlusPlus other, long otherBucket)

int

precision()

Precision of the algorithm

static int

precisionFromThreshold(long count)

Compute the required precision so that count distinct entries would be counted with linear counting.

Methods inherited from class org.elasticsearch.search.aggregations.metrics.AbstractHyperLogLogPlusPlus
clone, equals, hashCode, readFrom, writeTo

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- DEFAULT_PRECISION
  
  public static final int DEFAULT_PRECISION
  
  See Also:
  
  Constant Field Values
- MIN_PRECISION
  
  public static final int MIN_PRECISION
  
  See Also:
  
  Constant Field Values
- MAX_PRECISION
  
  public static final int MAX_PRECISION
  
  See Also:
  
  Constant Field Values
- p
  
  protected final int p
Constructor Details
- HyperLogLogPlusPlus
  
  public HyperLogLogPlusPlus(int precision, BigArrays bigArrays, long initialBucketCount)
Method Details
- precisionFromThreshold
  
  public static int precisionFromThreshold(long count)
  
  Compute the required precision so that count distinct entries would be counted with linear counting.
- memoryUsage
  
  public static long memoryUsage(int precision)
  
  Return the expected per-bucket memory usage for the given precision.
- maxOrd
  
  public long maxOrd()
  
  Description copied from class: AbstractHyperLogLogPlusPlus
  
  Get the number of data structures
  
  Specified by:
  
  maxOrd in class AbstractHyperLogLogPlusPlus
- cardinality
  
  public long cardinality(long bucketOrd)
  
  Returns the current computed cardinality
- getAlgorithm
  
  protected boolean getAlgorithm(long bucketOrd)
  
  Description copied from class: AbstractHyperLogLogPlusPlus
  
  Algorithm used in the given bucket
  
  Specified by:
  
  getAlgorithm in class AbstractHyperLogLogPlusPlus
- getLinearCounting
  
  protected AbstractLinearCounting.HashesIterator getLinearCounting(long bucketOrd)
  
  Description copied from class: AbstractHyperLogLogPlusPlus
  
  Get linear counting algorithm
  
  Specified by:
  
  getLinearCounting in class AbstractHyperLogLogPlusPlus
- getHyperLogLog
  
  protected AbstractHyperLogLog.RunLenIterator getHyperLogLog(long bucketOrd)
  
  Description copied from class: AbstractHyperLogLogPlusPlus
  
  Get HyperLogLog algorithm
  
  Specified by:
  
  getHyperLogLog in class AbstractHyperLogLogPlusPlus
- collect
  
  public void collect(long bucket, long hash)
  
  Description copied from class: AbstractHyperLogLogPlusPlus
  
  Collect a value in the given bucket
  
  Specified by:
  
  collect in class AbstractHyperLogLogPlusPlus
- close
  
  public void close()
- addRunLen
  
  protected void addRunLen(long bucketOrd, int register, int runLen)
- merge
  
  public void merge(long thisBucket, AbstractHyperLogLogPlusPlus other, long otherBucket)
- precision
  
  public int precision()
  
  Precision of the algorithm

Class HyperLogLogPlusPlus

Field Summary

Fields inherited from class org.elasticsearch.search.aggregations.metrics.AbstractHyperLogLogPlusPlus

Constructor Summary

Method Summary

Methods inherited from class org.elasticsearch.search.aggregations.metrics.AbstractHyperLogLogPlusPlus

Methods inherited from class java.lang.Object

Field Details

DEFAULT_PRECISION

MIN_PRECISION

MAX_PRECISION

p

Constructor Details

HyperLogLogPlusPlus

Method Details

precisionFromThreshold

memoryUsage

maxOrd

cardinality

getAlgorithm

getLinearCounting

getHyperLogLog

collect

close

addRunLen

merge

precision