public final class DelimitedTermFrequencyTokenFilter extends TokenFilter
TokenFilter the field must be indexed with
IndexOptions.DOCS_AND_FREQS but no positions or offsets.
For example, if the delimiter is '|', then for the string "foo|5", "foo" is the token and "5" is a term frequency. If there is no delimiter, the TokenFilter does not modify the term frequency.
Note make sure your Tokenizer doesn't split on the delimiter, or this won't work
AttributeSource.State| Modifier and Type | Field and Description |
|---|---|
static char |
DEFAULT_DELIMITER |
inputDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
DelimitedTermFrequencyTokenFilter(TokenStream input) |
DelimitedTermFrequencyTokenFilter(TokenStream input,
char delimiter) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
incrementToken() |
close, end, resetaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic static final char DEFAULT_DELIMITER
public DelimitedTermFrequencyTokenFilter(TokenStream input)
public DelimitedTermFrequencyTokenFilter(TokenStream input, char delimiter)
public boolean incrementToken()
throws IOException
incrementToken in class TokenStreamIOExceptionCopyright © 2000-2021 Apache Software Foundation. All Rights Reserved.