Class BucketedSort
- All Implemented Interfaces:
Closeable,AutoCloseable,org.elasticsearch.core.Releasable
- Direct Known Subclasses:
BucketedSort.ForDoubles,BucketedSort.ForFloats,BucketedSort.ForLongs
- They can have many, many buckets so this implementation backs to
BigArraysso it doesn't need to allocate any objects per bucket and the circuit breaker in BigArrays will automatically track memory usage and abort execution if it grows too large. - Its fairly common for a bucket to be collected but not returned so these implementations delay as much work as possible until collection
Every bucket is in one of two states: "gathering" or min/max "heap". While
"gathering" the next empty slot is stored in the "root" offset of the
bucket and collecting a value is just adding it in the next slot bumping
the tracking value at the root. So collecting values is O(1).
Extracting the results in sorted order is O(n * log n) because,
well, sorting is O(n * log n). When a bucket has collected
bucketSize entries it is converted into a min "heap" in
O(n) time. Or into max heap, if order is ascending.
Once a "heap", collecting a document is the heap-standard O(log n)
worst case. Critically, it is a very fast O(1) to check if a value
is competitive at all which, so long as buckets aren't hit in reverse
order, they mostly won't be. Extracting results in sorted order is still
O(n * log n).
When we first collect a bucket we make sure that we've allocated enough slots to hold all sort values for the entire bucket. In other words: the storage is "dense" and we don't try to save space when storing partially filled buckets.
We actually *oversize* the allocations
(like BigArrays.overSize(long)) to get amortized linear number
of allocations and to play well with our paged arrays.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceCallbacks for storing extra data along with competitive sorts.static classSuperclass for implementations of BucketedSort fordoublekeys.static classSuperclass for implementations of BucketedSort forfloatkeys.static classSuperclass for implementations of BucketedSort forlongkeys.classPerforms the actual collection against a LeafReaderContext.static interfaceUsed withgetValues(long, ResultBuilder)to build results from the sorting operation. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected BigArraysprotected BucketedSort.ExtraDatastatic BucketedSort.ExtraDataAn implementation of BucketedSort.ExtraData that does nothing. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedBucketedSort(BigArrays bigArrays, SortOrder order, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra) -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract booleanbetterThan(long lhs, long rhs)trueif the entry at indexlhsis "better" than the entry atrhs.voidclose()protected StringReturn a fairly human readable representation of the array backing the sort.abstract BucketedSort.LeafforLeaf(org.apache.lucene.index.LeafReaderContext ctx)Get the BucketedSort.Leaf implementation that'll do that actual collecting.intThe number of values to store per bucket.The format to use when presenting the values.protected abstract intgetNextGatherOffset(long rootIndex)Get the next index that should be "gathered" for a bucket rooted atrootIndex.getOrder()The order of the sort.protected abstract SortValuegetValue(long index)Get the value at an index.getValues(long bucket)Get the values for a bucket if it has been collected.<T extends Comparable<T>>
List<T>getValues(long bucket, BucketedSort.ResultBuilder<T> builder)Get the values for a bucket if it has been collected.protected abstract voidgrowValues(long minSize)Grow the BigArray backing this sort to account for new buckets.booleaninHeapMode(long bucket)Is this bucket a min heaptrueor in gathering modefalse?protected voidInitialize the gather offsets after setting up values.abstract booleanDoes this sort need scores? Most don't, but sorting on_scoredoes.protected abstract voidsetNextGatherOffset(long rootIndex, int offset)Set the next index that should be "gathered" for a bucket rooted atrootIndex.protected abstract voidswap(long lhs, long rhs)Swap the data at two indices.protected abstract BigArrayvalues()The BigArray backing this sort.
-
Field Details
-
NOOP_EXTRA_DATA
An implementation of BucketedSort.ExtraData that does nothing. -
bigArrays
-
extra
-
-
Constructor Details
-
BucketedSort
protected BucketedSort(BigArrays bigArrays, SortOrder order, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra)
-
-
Method Details
-
getOrder
The order of the sort. -
getFormat
The format to use when presenting the values. -
getBucketSize
public int getBucketSize()The number of values to store per bucket. -
getValues
public final <T extends Comparable<T>> List<T> getValues(long bucket, BucketedSort.ResultBuilder<T> builder) throws IOExceptionGet the values for a bucket if it has been collected. If it hasn't then returns an empty list.- Parameters:
builder- builds results. SeeBucketedSort.ExtraDatafor how to store data along side the sort for this to extract.- Throws:
IOException
-
getValues
Get the values for a bucket if it has been collected. If it hasn't then returns an empty array.- Throws:
IOException
-
inHeapMode
public boolean inHeapMode(long bucket)Is this bucket a min heaptrueor in gathering modefalse? -
forLeaf
public abstract BucketedSort.Leaf forLeaf(org.apache.lucene.index.LeafReaderContext ctx) throws IOExceptionGet the BucketedSort.Leaf implementation that'll do that actual collecting.- Throws:
IOException- most implementations need to perform IO to prepare for each leaf
-
needsScores
public abstract boolean needsScores()Does this sort need scores? Most don't, but sorting on_scoredoes. -
values
The BigArray backing this sort. -
growValues
protected abstract void growValues(long minSize)Grow the BigArray backing this sort to account for new buckets. This will only be called if the array is too small. -
getNextGatherOffset
protected abstract int getNextGatherOffset(long rootIndex)Get the next index that should be "gathered" for a bucket rooted atrootIndex. -
setNextGatherOffset
protected abstract void setNextGatherOffset(long rootIndex, int offset)Set the next index that should be "gathered" for a bucket rooted atrootIndex. -
getValue
Get the value at an index. -
betterThan
protected abstract boolean betterThan(long lhs, long rhs)trueif the entry at indexlhsis "better" than the entry atrhs. "Better" in this means "lower" forSortOrder.ASCand "higher" forSortOrder.DESC. -
swap
protected abstract void swap(long lhs, long rhs)Swap the data at two indices. -
debugFormat
Return a fairly human readable representation of the array backing the sort.This is intentionally not a
Object.toString()implementation because it'll be quite slow. -
initGatherOffsets
protected final void initGatherOffsets()Initialize the gather offsets after setting up values. Subclasses should call this once, after setting up theirvalues(). -
close
public final void close()- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceorg.elasticsearch.core.Releasable
-