本文共 3812 字,大约阅读时间需要 12 分钟。
Lucene 是一个用于文本搜索的库(解释为:[ Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java ],即 Lucene 是完全使用 Java 实现的,功能全面的高性能文本搜索引擎库)。我在这里说的是 Lucene-core 这个库(jar),因为围绕着 Lucene 有大量的扩展实现可以使用。
Lucene 两个基本的功能即 创建索引 和 搜索索引。主要类如下图:
创建索引时,分割需要被索引的数据,构造 Field 和 Document,然后使用 IndexWriter 写入到索引文件中(生成的索引文件会有多个,所以必须指定一个目录用于存放索引,执行搜索时,基于相同的目录(Directory)执行搜索) 搜索索引时,使用 IndexSearcher 在 Document 中对 Query 指定的 Field 进行搜索,以返回符合 Query 要求的 Document下面是一个 Lucene 的使用 demo(该 demo 根据 做了细微的改动)(这里使用的 Lucene 版本是 8.0.0,需要 JDK 8及以上版本,使用 Maven 管理依赖):
import java.io.IOException;import java.nio.file.Path;import java.nio.file.Paths;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.document.StringField;import org.apache.lucene.document.TextField;import org.apache.lucene.index.DirectoryReader;import org.apache.lucene.index.IndexReader;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.index.IndexWriterConfig;import org.apache.lucene.index.IndexWriterConfig.OpenMode;import org.apache.lucene.queryparser.classic.ParseException;import org.apache.lucene.queryparser.classic.QueryParser;import org.apache.lucene.search.IndexSearcher;import org.apache.lucene.search.Query;import org.apache.lucene.search.ScoreDoc;import org.apache.lucene.search.TopDocs;import org.apache.lucene.store.Directory;import org.apache.lucene.store.FSDirectory;public class Demo2 { public static void main(String[] args) throws IOException, ParseException { // create analyzer and directory StandardAnalyzer analyzer = new StandardAnalyzer(); Path path = Paths.get("F:/lucene-demo-index", new String[0]); Directory index = FSDirectory.open(path); // indexing // 1 create index-writer IndexWriterConfig config = new IndexWriterConfig(analyzer); config.setOpenMode(OpenMode.CREATE); IndexWriter writer = new IndexWriter(index, config); // 2 write index addDoc(writer, "Lucene in Action", "193398817"); addDoc(writer, "Lucene for Dummies", "55320055Z"); addDoc(writer, "Managing Gigabytes", "55063554A"); addDoc(writer, "The Art of Computer Science", "9900333X"); writer.close(); // search // 1 create query String queryStr = "lucene"; Query q = new QueryParser("title", analyzer).parse(queryStr); System.out.println("query: " + q.toString()); int hitsPerPage = 10; // 2 create index-searcher IndexReader reader = DirectoryReader.open(index); IndexSearcher searcher = new IndexSearcher(reader); // 3 do search TopDocs docs = searcher.search(q, hitsPerPage); ScoreDoc[] hits = docs.scoreDocs; // display results System.out.println("found " + hits.length + " results"); for(ScoreDoc hit : hits) { int docId = hit.doc; Document doc = searcher.doc(docId); System.out.println(doc.get("title") + " - " + doc.get("isbn")); } } private static void addDoc(IndexWriter writer, String title, String isbn) throws IOException { Document doc = new Document(); doc.add(new TextField("title", title, Field.Store.YES)); doc.add(new StringField("isbn", isbn, Field.Store.YES)); writer.addDocument(doc); }}
Maven 依赖为:
org.apache.lucene lucene-core 8.0.0 org.apache.lucene lucene-queryparser 8.0.0
程序运行结果为:
query: title:lucenefound 2 resultsLucene in Action - 193398817Lucene for Dummies - 55320055Z
程序中 Query 语句为 “title:lucene”,即 查找标题(title)域(Field)中包含单词 “lucene” 的文档(Document)。
转载地址:http://ujlsi.baihongyu.com/