Class HyperLogLogPlusPlus
java.lang.Object
org.elasticsearch.search.aggregations.metrics.AbstractHyperLogLogPlusPlus
org.elasticsearch.search.aggregations.metrics.HyperLogLogPlusPlus
- All Implemented Interfaces:
Closeable
,AutoCloseable
,org.elasticsearch.core.Releasable
Hyperloglog++ counter, implemented based on pseudo code from
http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf and its appendix
https://docs.google.com/document/d/1gyjfMHy43U9OWBXxfaeG-3MjGzejW1dlpyMwEYAAWEI/view?fullscreen
This implementation is different from the original implementation in that it uses a hash table instead of a sorted list for linear
counting. Although this requires more space and makes hyperloglog (which is less accurate) used sooner, this is also considerably faster.
Trying to understand what this class does without having read the paper is considered adventurous.
The HyperLogLogPlusPlus contains two algorithms, one for linear counting and the HyperLogLog algorithm. Initially hashes added to the
data structure are processed using the linear counting until a threshold defined by the precision is reached where the data is replayed
to the HyperLogLog algorithm and then this is used.
It supports storing several HyperLogLogPlusPlus structures which are identified by a bucket number.
-
Field Summary
Modifier and TypeFieldDescriptionstatic int
static int
static int
protected int
Fields inherited from class org.elasticsearch.search.aggregations.metrics.AbstractHyperLogLogPlusPlus
HYPERLOGLOG, LINEAR_COUNTING
-
Constructor Summary
ConstructorDescriptionHyperLogLogPlusPlus(int precision, BigArrays bigArrays, long initialBucketCount)
-
Method Summary
Modifier and TypeMethodDescriptionprotected void
addRunLen(long bucketOrd, int register, int runLen)
long
cardinality(long bucketOrd)
Returns the current computed cardinalityvoid
close()
void
collect(long bucket, long hash)
Collect a value in the given bucketprotected boolean
getAlgorithm(long bucketOrd)
Algorithm used in the given bucketprotected AbstractHyperLogLog.RunLenIterator
getHyperLogLog(long bucketOrd)
Get HyperLogLog algorithmprotected AbstractLinearCounting.HashesIterator
getLinearCounting(long bucketOrd)
Get linear counting algorithmlong
maxOrd()
Get the number of data structuresstatic long
memoryUsage(int precision)
Return the expected per-bucket memory usage for the given precision.void
merge(long thisBucket, AbstractHyperLogLogPlusPlus other, long otherBucket)
int
Precision of the algorithmstatic int
precisionFromThreshold(long count)
Compute the required precision so thatcount
distinct entries would be counted with linear counting.
-
Field Details
-
DEFAULT_PRECISION
public static final int DEFAULT_PRECISION- See Also:
- Constant Field Values
-
MIN_PRECISION
public static final int MIN_PRECISION- See Also:
- Constant Field Values
-
MAX_PRECISION
public static final int MAX_PRECISION- See Also:
- Constant Field Values
-
p
protected final int p
-
-
Constructor Details
-
HyperLogLogPlusPlus
-
-
Method Details
-
precisionFromThreshold
public static int precisionFromThreshold(long count)Compute the required precision so thatcount
distinct entries would be counted with linear counting. -
memoryUsage
public static long memoryUsage(int precision)Return the expected per-bucket memory usage for the given precision. -
maxOrd
public long maxOrd()Description copied from class:AbstractHyperLogLogPlusPlus
Get the number of data structures- Specified by:
maxOrd
in classAbstractHyperLogLogPlusPlus
-
cardinality
public long cardinality(long bucketOrd)Returns the current computed cardinality -
getAlgorithm
protected boolean getAlgorithm(long bucketOrd)Description copied from class:AbstractHyperLogLogPlusPlus
Algorithm used in the given bucket- Specified by:
getAlgorithm
in classAbstractHyperLogLogPlusPlus
-
getLinearCounting
Description copied from class:AbstractHyperLogLogPlusPlus
Get linear counting algorithm- Specified by:
getLinearCounting
in classAbstractHyperLogLogPlusPlus
-
getHyperLogLog
Description copied from class:AbstractHyperLogLogPlusPlus
Get HyperLogLog algorithm- Specified by:
getHyperLogLog
in classAbstractHyperLogLogPlusPlus
-
collect
public void collect(long bucket, long hash)Description copied from class:AbstractHyperLogLogPlusPlus
Collect a value in the given bucket- Specified by:
collect
in classAbstractHyperLogLogPlusPlus
-
close
public void close() -
addRunLen
protected void addRunLen(long bucketOrd, int register, int runLen) -
merge
-
precision
public int precision()Precision of the algorithm
-