public class PatternTypingFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
<fieldType name="text_taf" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="com.example.PatternTypingFilter" patternFile="patterns.txt"/>
<filter class="solr.TokenAnalyzerFilter" asType="text_en" preserveType="true"/>
<filter class="solr.TypeAsSynonymFilterFactory" prefix="__TAS__"
ignore="word,<ALPHANUM>,<NUM>,<SOUTHEAST_ASIAN>,<IDEOGRAPHIC>,<HIRAGANA>,<KATAKANA>,<HANGUL>,<EMOJI>"/>
</analyzer>
</fieldType>
Note that a configuration such as above may interfere with multi-word synonyms. The patterns file has the format:
(flags) (pattern) ::: (replacement)Therefore to set the first 2 flag bits on the original token matching 401k or 401(k) and adding a type of 'legal2_401_k' whenever either one is encountered one would use:
3 (\d+)\(?([a-z])\)? ::: legal2_$1_$2Note that the number indicating the flag bits to set must not have leading spaces and be followed by a single space, and must be 0 if no flags should be set. The flags number should not contain commas or a decimal point. Lines for which the first character is
# will be ignored as comments. Does not support producing
a synonym textually identical to the original term.| Modifier and Type | Field and Description |
|---|---|
static String |
NAME
SPI name
|
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion| Constructor and Description |
|---|
PatternTypingFilterFactory(Map<String,String> args)
Creates a new PatternTypingFilterFactory
|
| Modifier and Type | Method and Description |
|---|---|
TokenStream |
create(TokenStream input)
Transform the specified input TokenStream
|
void |
inform(ResourceLoader loader)
Initializes this component with the provided ResourceLoader
(used for loading classes, files, etc).
|
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFiltersget, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNamespublic static final String NAME
public void inform(ResourceLoader loader) throws IOException
ResourceLoaderAwareinform in interface ResourceLoaderAwareIOExceptionpublic TokenStream create(TokenStream input)
TokenFilterFactorycreate in class TokenFilterFactoryCopyright © 2000-2021 Apache Software Foundation. All Rights Reserved.