問題描述
我有一個從索引中搜索的標(biāo)準(zhǔn) Lucene 應(yīng)用程序.我的索引包含很多法語術(shù)語,我想使用 ASCIIFoldingFilter.
I have a standard Lucene app which searches from an index. My index contains a lot of french terms and I'd like to use the ASCIIFoldingFilter.
我已經(jīng)做了很多搜索,但我不知道如何使用它.構(gòu)造函數(shù)接受一個 TokenStream 對象,當(dāng)您向它發(fā)送一個字段時,我是否調(diào)用分析器上檢索 TokenStream 的方法?那我該怎么辦?有人可以指出一個使用 TokenFilter 的例子嗎?謝謝.
I've done a lot of searching and I have no idea how to use it. The constructor takes a TokenStream object, do I call the method on the analyzer that retrieves a TokenStream when you send it a field? Then what do I do? Can someone point me to an example where a TokenFilter is being used? Thanks.
推薦答案
令牌過濾器 - 就像 ASCIIFoldingFilter - 在它們的基礎(chǔ)上是一個 TokenStream,所以它們是分析器主要通過使用以下方法返回的東西:
The token filters - like the ASCIIFoldingFilter - are at their base a TokenStream, so they are something that the Analyzer returns mainly by use of the following method:
public abstract TokenStream tokenStream(String fieldName, Reader reader);
如您所見,過濾器將 TokenStream 作為輸入.它們的作用類似于包裝器,或者更準(zhǔn)確地說,類似于輸入的 裝飾器.這意味著它們增強了包含的 TokenStream 的行為,同時執(zhí)行它們的操作和包含的輸入的操作.
As you have noticed, the filters take a TokenStream as an input. They act like wrappers or, more correctly said, like decorators to their input. That means they enhance the behavior of the contained TokenStream, performing both their operation and the operation of the contained input.
您可以在這里找到解釋.它不是直接引用 ASCIIFoldingFilter 但同樣的原則適用.基本上,您創(chuàng)建一個自定義分析器,其中包含類似的內(nèi)容(精簡示例):
You can find an explanation here. It is not directly refering to an ASCIIFoldingFilter but the same principle applies. Basically, you create a custom Analyzer with something like this in it (stripped down example):
public class CustomAnalyzer extends Analyzer {
// other content omitted
// ...
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
// etc etc ...
result = new StopFilter(result, yourSetOfStopWords);
result = new ASCIIFoldingFilter(result);
return result;
}
// ...
}
TokenFilter 和 Tokenizer 都是 TokenStream 的子類.
Both the TokenFilter and the Tokenizer are subclasses of TokenStream.
還請記住,您必須在索引和搜索中使用相同的自定義分析器,否則您可能會在查詢中得到不正確的結(jié)果.
Remember also that you must make use of the same custom analyzer both in indexing and searching or you might get incorrect results in your queries.
這篇關(guān)于如何在我的 Lucene 應(yīng)用程序中使用 ASCIIFoldingFilter?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!