public final class PatternCaptureGroupTokenFilter extends TokenFilter
For example, a pattern like:
"(https?://([a-zA-Z\-_0-9.]+))"
when matched against the string "http://www.foo.com/index" would return the tokens "https://www.foo.com" and "www.foo.com".
If none of the patterns match, or if preserveOriginal is true, the original token will be preserved.
Each pattern is matched as often as it can be, so the pattern
"(...)", when matched against "abcdefghi" would
produce ["abc","def","ghi"]
A camelCaseFilter could be written as:
"([A-Z]{2,})",
"(?<![A-Z])([A-Z][a-z]+)",
"(?:^|\\b|(?<=[0-9_])|(?<=[A-Z]{2}))([a-z]+)",
"([0-9]+)"
plus if preserveOriginal is true, it would also return
"camelCaseFilter"
AttributeSource.StateinputDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
PatternCaptureGroupTokenFilter(TokenStream input,
boolean preserveOriginal,
Pattern... patterns) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
incrementToken() |
void |
reset() |
close, endaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic PatternCaptureGroupTokenFilter(TokenStream input, boolean preserveOriginal, Pattern... patterns)
input - the input TokenStreampreserveOriginal - set to true to return the original token even if one of the
patterns matchespatterns - an array of Pattern objects to match against each tokenpublic boolean incrementToken()
throws IOException
incrementToken in class TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class TokenFilterIOExceptionCopyright © 2000-2021 Apache Software Foundation. All Rights Reserved.