2 Licensed to the Apache Software Foundation (ASF) under one or more
3 contributor license agreements. See the NOTICE file distributed with
4 this work for additional information regarding copyright ownership.
5 The ASF licenses this file to You under the Apache License, Version 2.0
6 (the "License"); you may not use this file except in compliance with
7 the License. You may obtain a copy of the License at
9 http://www.apache.org/licenses/LICENSE-2.0
11 Unless required by applicable law or agreed to in writing, software
12 distributed under the License is distributed on an "AS IS" BASIS,
13 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 See the License for the specific language governing permissions and
15 limitations under the License.
19 Licensed to the Apache Software Foundation (ASF) under one or more
20 contributor license agreements. See the NOTICE file distributed with
21 this work for additional information regarding copyright ownership.
22 The ASF licenses this file to You under the Apache License, Version 2.0
23 (the "License"); you may not use this file except in compliance with
24 the License. You may obtain a copy of the License at
26 http://www.apache.org/licenses/LICENSE-2.0
28 Unless required by applicable law or agreed to in writing, software
29 distributed under the License is distributed on an "AS IS" BASIS,
30 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
31 See the License for the specific language governing permissions and
32 limitations under the License.
35 <title>Apache Lucene API</title>
39 <p>Apache Lucene is a high-performance, full-featured text search engine library.
40 Here's a simple example how to use Lucene for indexing and searching (using JUnit
41 to check if the results are what we expect):</p>
43 <!-- code comes from org.apache.lucene.TestDemo: -->
44 <!-- ======================================================== -->
45 <!-- = Java Sourcecode to HTML automatically converted code = -->
46 <!-- = Java2Html Converter 5.0 [2006-03-04] by Markus Gebhard markus@jave.de = -->
47 <!-- = Further information: http://www.java2html.de = -->
48 <pre class="prettyprint">
49 Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
51 // Store the index in memory:
52 Directory directory = new RAMDirectory();
53 // To store an index on disk, use this instead:
54 //Directory directory = FSDirectory.open("/tmp/testindex");
55 IndexWriter iwriter = new IndexWriter(directory, analyzer, true,
56 new IndexWriter.MaxFieldLength(25000));
57 Document doc = new Document();
58 String text = "This is the text to be indexed.";
59 doc.add(new Field("fieldname", text, Field.Store.YES,
60 Field.Index.ANALYZED));
61 iwriter.addDocument(doc);
64 // Now search the index:
65 IndexReader ireader = IndexReader.open(directory); // read-only=true
66 IndexSearcher isearcher = new IndexSearcher(ireader);
67 // Parse a simple query that searches for "text":
68 QueryParser parser = new QueryParser("fieldname", analyzer);
69 Query query = parser.parse("text");
70 ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
71 assertEquals(1, hits.length);
72 // Iterate through the results:
73 for (int i = 0; i < hits.length; i++) {
74 Document hitDoc = isearcher.doc(hits[i].doc);
75 assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
79 directory.close();</pre>
80 <!-- = END of automatically generated HTML code = -->
81 <!-- ======================================================== -->
85 <p>The Lucene API is divided into several packages:</p>
89 <b><a href="org/apache/lucene/analysis/package-summary.html">org.apache.lucene.analysis</a></b>
90 defines an abstract <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>
91 API for converting text from a <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>
92 into a <a href="org/apache/lucene/analysis/TokenStream.html">TokenStream</a>,
93 an enumeration of token <a href="org/apache/lucene/util/Attribute.html">Attribute</a>s.
94 A TokenStream can be composed by applying <a href="org/apache/lucene/analysis/TokenFilter.html">TokenFilter</a>s
95 to the output of a <a href="org/apache/lucene/analysis/Tokenizer.html">Tokenizer</a>.
96 Tokenizers and TokenFilters are strung together and applied with an <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>.
97 A handful of Analyzer implementations are provided, including <a href="org/apache/lucene/analysis/StopAnalyzer.html">StopAnalyzer</a>
98 and the grammar-based <a href="org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a>.</li>
101 <b><a href="org/apache/lucene/document/package-summary.html">org.apache.lucene.document</a></b>
102 provides a simple <a href="org/apache/lucene/document/Document.html">Document</a>
103 class. A Document is simply a set of named <a href="org/apache/lucene/document/Field.html">Field</a>s,
104 whose values may be strings or instances of <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>.</li>
107 <b><a href="org/apache/lucene/index/package-summary.html">org.apache.lucene.index</a></b>
108 provides two primary classes: <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>,
109 which creates and adds documents to indices; and <a href="org/apache/lucene/index/IndexReader.html">IndexReader</a>,
110 which accesses the data in the index.</li>
113 <b><a href="org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a></b>
114 provides data structures to represent queries (ie <a href="org/apache/lucene/search/TermQuery.html">TermQuery</a>
115 for individual words, <a href="org/apache/lucene/search/PhraseQuery.html">PhraseQuery</a>
116 for phrases, and <a href="org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>
117 for boolean combinations of queries) and the abstract <a href="org/apache/lucene/search/Searcher.html">Searcher</a>
118 which turns queries into <a href="org/apache/lucene/search/TopDocs.html">TopDocs</a>.
119 <a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
120 implements search over a single IndexReader.</li>
123 <b><a href="org/apache/lucene/queryParser/package-summary.html">org.apache.lucene.queryParser</a></b>
124 uses <a href="http://javacc.dev.java.net">JavaCC</a> to implement a
125 <a href="org/apache/lucene/queryParser/QueryParser.html">QueryParser</a>.</li>
128 <b><a href="org/apache/lucene/store/package-summary.html">org.apache.lucene.store</a></b>
129 defines an abstract class for storing persistent data, the <a href="org/apache/lucene/store/Directory.html">Directory</a>,
130 which is a collection of named files written by an <a href="org/apache/lucene/store/IndexOutput.html">IndexOutput</a>
131 and read by an <a href="org/apache/lucene/store/IndexInput.html">IndexInput</a>.
132 Multiple implementations are provided, including <a href="org/apache/lucene/store/FSDirectory.html">FSDirectory</a>,
133 which uses a file system directory to store files, and <a href="org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a>
134 which implements files as memory-resident data structures.</li>
137 <b><a href="org/apache/lucene/util/package-summary.html">org.apache.lucene.util</a></b>
138 contains a few handy data structures and util classes, ie <a href="org/apache/lucene/util/BitVector.html">BitVector</a>
139 and <a href="org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>.</li>
141 To use Lucene, an application should:
144 Create <a href="org/apache/lucene/document/Document.html">Document</a>s by
146 <a href="org/apache/lucene/document/Field.html">Field</a>s;</li>
149 Create an <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>
150 and add documents to it with <a href="org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document)">addDocument()</a>;</li>
153 Call <a href="org/apache/lucene/queryParser/QueryParser.html#parse(java.lang.String)">QueryParser.parse()</a>
154 to build a query from a string; and</li>
157 Create an <a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
158 and pass the query to its <a href="org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query)">search()</a>
161 Some simple examples of code which does this are:
164 <a href="http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/contrib/demo/src/java/org/apache/lucene/demo/IndexFiles.java">IndexFiles.java</a> creates an
165 index for all the files contained in a directory.</li>
168 <a href="http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/demo/src/java/org/apache/lucene/demo/SearchFiles.java">SearchFiles.java</a> prompts for
169 queries and searches an index.</li>
171 To demonstrate these, try something like:
172 <blockquote><tt>> <b>java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups</b></tt>
173 <br><tt>adding rec.food.recipes/soups/abalone-chowder</tt>
174 <br><tt> </tt>[ ... ]
176 <p><tt>> <b>java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles</b></tt>
177 <br><tt>Query: <b>chowder</b></tt>
178 <br><tt>Searching for: chowder</tt>
179 <br><tt>34 total matching documents</tt>
180 <br><tt>1. rec.food.recipes/soups/spam-chowder</tt>
181 <br><tt> </tt>[ ... thirty-four documents contain the word "chowder" ... ]
183 <p><tt>Query: <b>"clam chowder" AND Manhattan</b></tt>
184 <br><tt>Searching for: +"clam chowder" +manhattan</tt>
185 <br><tt>2 total matching documents</tt>
186 <br><tt>1. rec.food.recipes/soups/clam-chowder</tt>
187 <br><tt> </tt>[ ... two documents contain the phrase "clam chowder"
188 and the word "manhattan" ... ]
189 <br> [ Note: "+" and "-" are canonical, but "AND", "OR"
190 and "NOT" may be used. ]</blockquote>