天天草天天草,国产不卡一区,日本欧美日韩

本文介紹了在沒有索引的情況下使用 Lucene Analyzer - 我的方法合理嗎?的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學習吧！

問題描述

我的目標是利用 Lucene 的許多標記器和過濾器來轉換輸入文本，但不創建任何索引.

My objective is to leverage some of Lucene's many tokenizers and filters to transform input text, but without the creation of any indexes.

例如，給定這個(人為的)輸入字符串...

For example, given this (contrived) input string...

" 某人的 - [texté] 在這里，foo ."

...還有像這樣的 Lucene 分析器...

...and a Lucene analyzer like this...

Analyzer analyzer = CustomAnalyzer.builder()
        .withTokenizer("icu")
        .addTokenFilter("lowercase")
        .addTokenFilter("icuFolding")
        .build();

我想得到以下輸出:

某人的文本在這里 foo

下面的 Java 方法可以滿足我的需求.

The below Java method does what I want.

但有沒有更好(即更典型和/或更簡潔)的方式讓我這樣做?

我特別想的是我使用 TokenStream 和 CharTermAttribute 的方式，因為我以前從未像這樣使用過它們.感覺很笨重.

I am specifically thinking about the way I have used TokenStream and CharTermAttribute, since I have never used them like this before. Feels clunky.

代碼如下:

Lucene 8.3.0 導入:

Lucene 8.3.0 imports:

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.custom.CustomAnalyzer;

我的方法:

private String transform(String input) throws IOException {

    Analyzer analyzer = CustomAnalyzer.builder()
            .withTokenizer("icu")
            .addTokenFilter("lowercase")
            .addTokenFilter("icuFolding")
            .build();

    TokenStream ts = analyzer.tokenStream("myField", new StringReader(input));
    CharTermAttribute charTermAtt = ts.addAttribute(CharTermAttribute.class);

    StringBuilder sb = new StringBuilder();
    try {
        ts.reset();
        while (ts.incrementToken()) {
            sb.append(charTermAtt.toString()).append(" ");
        }
        ts.end();
    } finally {
        ts.close();
    }
    return sb.toString().trim();
}

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

在沒有索引的情況下使用 Lucene Analyzer - 我的方法

問題描述

推薦答案

相關文檔推薦