1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
4 <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
5 <meta content="Apache Forrest" name="Generator">
6 <meta name="Forrest-version" content="0.8">
7 <meta name="Forrest-skin-name" content="pelt">
8 <title>PyLucene Features</title>
9 <link type="text/css" href="../skin/basic.css" rel="stylesheet">
10 <link media="screen" type="text/css" href="../skin/screen.css" rel="stylesheet">
11 <link media="print" type="text/css" href="../skin/print.css" rel="stylesheet">
12 <link type="text/css" href="../skin/profile.css" rel="stylesheet">
13 <script src="../skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="../skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="../skin/fontsize.js" language="javascript" type="text/javascript"></script>
14 <link rel="shortcut icon" href="../">
16 <body onload="init()">
17 <script type="text/javascript">ndeSetTextSize();</script>
22 <div class="breadtrail">
23 <a href="http://www.apache.org/">apache</a> > <a href="http://lucene.apache.org/">lucene</a><script src="../skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
32 <div class="grouplogo">
33 <a href="http://lucene.apache.org/"><img class="logoImage" alt="Lucene" src="../images/lucene_green_150.gif" title="Lucene Description"></a>
41 <div class="projectlogoA1">
42 <a href="http://lucene.apache.org/pylucene/"><img class="logoImage" alt="PyLucene" src="../images/project.png" title="PyLucene Description"></a>
52 <a class="selected" href="../index.html">PyLucene</a>
55 <a class="unselected" href="../jcc/index.html">JCC</a>
64 <div id="publishedStrip">
68 <div id="level2tabs"></div>
72 <script type="text/javascript"><!--
73 document.write("Last Published: " + document.lastModified);
79 <div class="breadtrail">
90 <div onclick="SwitchMenu('menu_1.1', '../skin/')" id="menu_1.1Title" class="menutitle">About</div>
91 <div id="menu_1.1" class="menuitemgroup">
92 <div class="menuitem">
93 <a href="../index.html" title="Welcome to PyLucene">Index</a>
96 <div onclick="SwitchMenu('menu_selected_1.2', '../skin/')" id="menu_selected_1.2Title" class="menutitle" style="background-image: url('../skin/images/chapter_open.gif');">Documentation</div>
97 <div id="menu_selected_1.2" class="selectedmenuitemgroup" style="display: block;">
98 <div class="menuitem">
99 <a href="../documentation/install.html">Installation</a>
101 <div class="menupage">
102 <div class="menupagetitle">Features</div>
105 <div onclick="SwitchMenu('menu_1.3', '../skin/')" id="menu_1.3Title" class="menutitle">Resources</div>
106 <div id="menu_1.3" class="menuitemgroup">
107 <div class="menuitem">
108 <a href="http://www.apache.org/dyn/closer.cgi/lucene/pylucene/">Releases</a>
110 <div class="menuitem">
111 <a href="../resources/version_control.html">Source Code</a>
113 <div class="menuitem">
114 <a href="../resources/mailing_lists.html">Mailing Lists</a>
116 <div class="menuitem">
117 <a href="http://issues.apache.org/jira/browse/PyLucene">Issue Tracking</a>
120 <div id="credit"></div>
121 <div id="roundbottom">
122 <img style="display: none" class="corner" height="15" width="15" alt="" src="../skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
126 <div id="credit2"></div>
135 <div title="Portable Document Format" class="pdflink">
136 <a class="dida" href="readme.pdf"><img alt="PDF -icon" src="../skin/images/pdfdoc.gif" class="skin"><br>
139 <h1>PyLucene Features</h1>
140 <div id="minitoc-area">
143 <a href="#install">Installing PyLucene</a>
146 <a href="#api">API documentation</a>
149 <a href="#samples">Samples</a>
152 <a href="#threading">Threading support with attachCurrentThread</a>
155 <a href="#exceptions">Exception handling with lucene.JavaError</a>
158 <a href="#arrays">Handling Java arrays</a>
161 <a href="#differences">Differences between the Java Lucene and PyLucene APIs</a>
164 <a href="#python">Pythonic extensions to the Java Lucene APIs</a>
167 <a href="#extensions">Extending Java Lucene classes from Python</a>
174 <div class="warning">
175 <div class="label">Warning</div>
176 <div class="content">
177 Before calling any PyLucene API that requires the Java VM, start it by
178 calling <span class="codefrag">initVM(classpath, ...)</span>. More about this function
179 in <a href="../jcc/documentation/readme.html">here</a>.
183 <a name="N10017"></a><a name="install"></a>
184 <h2 class="boxed">Installing PyLucene</h2>
185 <div class="section">
187 PyLucene is a Python extension built with
188 <a href="../jcc/index.html">JCC</a>.
191 To build PyLucene, JCC needs to be built first. Sources for JCC are
192 included with the PyLucene sources. Instructions for building and
193 installing JCC are <a href="../jcc/documentation/install.html">here</a>.
196 Instruction for building PyLucene
197 are <a href="../documentation/install.html">here</a>.
201 <a name="N10033"></a><a name="api"></a>
202 <h2 class="boxed">API documentation</h2>
203 <div class="section">
205 PyLucene is closely tracking Java Lucene releases. It intends to
206 supports the entire Lucene API.
209 PyLucene also includes a number of Lucene contrib packages: the
210 Snowball analyzer and stemmers, the highlighter package, analyzers
211 for other languages than english, regular expression queries,
212 specialized queries such as 'more like this' and more.
215 This document only covers the pythonic extensions to Lucene offered
216 by PyLucene as well as some differences between the Java and Python
217 APIs. For the documentation on Java Lucene APIs,
218 see <a href="http://lucene.apache.org/java/docs/api/index.html">here</a>.
221 To help with debugging and to support some Lucene APIs, PyLucene also
222 exposes some Java runtime APIs.
224 <a name="N10049"></a><a name="samples"></a>
225 <h3 class="boxed">Samples</h3>
227 The best way to learn PyLucene is to look at the many samples
228 included with the PyLucene source release or on the web at:
234 <a href="http://svn.apache.org/viewcvs.cgi/lucene/pylucene/trunk/samples">http://svn.apache.org/viewcvs.cgi/lucene/pylucene/trunk/samples</a>
240 <a href="http://svn.apache.org/viewcvs.cgi/lucene/pylucene/trunk/samples/LuceneInAction">http://svn.apache.org/viewcvs.cgi/lucene/pylucene/trunk/samples/LuceneInAction</a>
246 A large number of samples are shipped with PyLucene. Most notably,
247 all the samples published in
248 the <a href="http://www.manning.com/hatcher2"><em>Lucene in
249 Action</em></a> book that did not depend on a third party Java
250 library for which there was no obvious Python equivalent were
251 ported to Python and PyLucene.
255 <em>Lucene in Action</em> is a great companion to learning
256 Lucene. Having all the samples available in Python should make it
257 even easier for Python developers.
261 <em>Lucene in Action</em> was written by Erik Hatcher and Otis
262 Gospodnetic, both part of the Java Lucene development team, and is
264 <a href="http://www.manning.com/hatcher2">Manning Publications</a>.
266 <a name="N1007C"></a><a name="threading"></a>
267 <h3 class="boxed">Threading support with attachCurrentThread</h3>
269 Before PyLucene APIs can be used from a thread other than the main
270 thread that was not created by the Java Runtime, the
271 <span class="codefrag">attachCurrentThread()</span> method must be called on the
272 <span class="codefrag">JCCEnv</span> object returned by the <span class="codefrag">initVM()</span>
273 or <span class="codefrag">getVMEnv()</span> functions.
275 <a name="N10092"></a><a name="exceptions"></a>
276 <h3 class="boxed">Exception handling with lucene.JavaError</h3>
278 Java exceptions are caught at the language barrier and reported to
279 Python by raising a JavaError instance whose args tuple contains the
280 actual Java Exception instance.
282 <a name="N1009C"></a><a name="arrays"></a>
283 <h3 class="boxed">Handling Java arrays</h3>
285 Java arrays are returned to Python in a <span class="codefrag">JArray</span>
286 wrapper instance that implements the Python sequence protocol. It
287 is possible to change array elements but not to change the array
291 A few Lucene APIs take array arguments and expect values to be
292 returned in them. To call such an API and be able to retrieve the
293 array values after the call, a Java array needs to instantiated
295 For example, accessing termDocs:
298 termDocs = reader.termDocs(Term("isbn", isbn))
299 docs = JArray('int')(1) # allocate an int[1] array
300 freq = JArray('int')(1) # allocate an int[1] array
301 if termDocs.read(docs, freq) == 1:
302 bits.set(docs[0]) # access the array's first element
305 In addition to <span class="codefrag">'int'</span>, the <span class="codefrag">'JArray'</span>
306 function accepts <span class="codefrag">'object'</span>, <span class="codefrag">'string'</span>,
307 <span class="codefrag">'bool'</span>, <span class="codefrag">'byte'</span>, <span class="codefrag">'char'</span>,
308 <span class="codefrag">'double'</span>, <span class="codefrag">'float'</span>, <span class="codefrag">'long'</span>
309 and <span class="codefrag">'short'</span> to create an array of the corresponding
310 type. The <span class="codefrag">JArray('object')</span> constructor takes a second
311 argument denoting the class of the object elements. This argument
312 is optional and defaults to Object.
315 To convert a char array to a Python string use a
316 <span class="codefrag">''.join(array)</span> construct.
319 Instead of an integer denoting the size of the desired Java array,
320 a sequence of objects of the expected element type may be passed
321 in to the array constructor.<br>
325 # creating a Java array of double from the [1.5, 2.5] list
326 JArray('double')([1.5, 2.5])
329 All methods that expect an array also accept a sequence of Python
330 objects of the expected element type. If no values are expected
331 from the array arguments after the call, it is hence not necessary
332 to instantiate a Java array to make such calls.
335 See <a href="../jcc/documentation/readme.html">JCC</a> for more
336 information about handling arrays.
338 <a name="N100F2"></a><a name="differences"></a>
339 <h3 class="boxed">Differences between the Java Lucene and PyLucene APIs</h3>
343 The PyLucene API exposes all Java Lucene classes in a flat namespace
344 in the PyLucene module. For example, the Java import
345 statement <span class="codefrag">import
346 org.apache.lucene.index.IndexReader;</span> corresponds to the
347 Python import statement <span class="codefrag">from lucene import
353 Downcasting is a common operation in Java but not a concept in
354 Python. Because the wrapper objects implementing exactly the
355 APIs of the declared type of the wrapped object, all classes
356 implement two class methods called instance_ and cast_ that
357 verify and cast an instance respectively.
361 <a name="N10108"></a><a name="python"></a>
362 <h3 class="boxed">Pythonic extensions to the Java Lucene APIs</h3>
364 Java is a very verbose language. Python, on the other hand, offers
365 many syntactically attractive constructs for iteration, property
366 access, etc... As the Java Lucene samples from the <em>Lucene in
367 Action</em> book were ported to Python, PyLucene received a number
368 of pythonic extensions listed here:
373 Iterating search hits is a very common operation. Hits instances
374 are iterable in Python. Two values are returned for each
375 iteration, the zero-based number of the document in the Hits
376 instance and the document instance itself.<br>
379 for (int i = 0; i < hits.length(); i++) {
380 Document doc = hits.doc(i);
381 System.out.println(hits.score(i) + " : " + doc.get("title"));
384 can be written in Python:
388 print hit.getScore(), ':', hit.getDocument['title']
390 if hit.iterator()'s next() method were declared to return
391 <span class="codefrag">Hit</span> instead of <span class="codefrag">Object</span>, the above
392 cast_() call would not be unnecessary.<br>
393 The same java loop can also be written:
395 for i xrange(len(hits)):
396 print hits.score(i), ':', hits[i]['title']
402 Hits instances partially implement the Python 'sequence'
404 The Java expressions:
409 are better written in Python:
418 Document instances have fields whose values can be accessed
419 through the mapping protocol.<br>
424 is better written in Python:
432 Document instances can be iterated over for their fields.<br>
435 Enumeration fields = doc.getFields();
436 while (fields.hasMoreElements()) {
437 Field field = (Field) fields.nextElement();
441 is better written in Python:
443 for field in doc.getFields():
444 field = Field.cast_(field)
447 Once JCC heeds Java 1.5 type parameters and once Java Lucene
448 makes use of them, such casting should become unncessary.
452 <a name="N10158"></a><a name="extensions"></a>
453 <h3 class="boxed">Extending Java Lucene classes from Python</h3>
455 Many areas of the Lucene API expect the programmer to provide
456 their own implementation or specialization of a feature where
457 the default is inappropriate. For example, text analyzers and
458 tokenizers are an area where many parameters and environmental
459 or cultural factors are calling for customization.
462 PyLucene enables this by providing Java extension points listed
463 below that serve as proxies for Java to call back into the
464 Python implementations of these customizations.
467 These extension points are simple Java classes that JCC
468 generates the native C++ implementations for. It is easy to add
469 more such extensions classes into the 'java' directory of the
470 PyLucene source tree.
473 To learn more about this topic, please refer to the JCC
474 <a href="../jcc/documentation/readme.html">documentation</a>.
477 Please refer to the classes in the 'java' tree for currently
478 available extension points. Examples of uses of these extension
479 points are to be found in PyLucene's unit tests and <em>Lucene
481 Action</em> <a href="http://svn.apache.org/viewcvs.cgi/lucene/pylucene/trunk/samples/LuceneInAction">samples</a>.
489 <div class="clearboth"> </div>
495 <div class="lastmodified">
496 <script type="text/javascript"><!--
497 document.write("Last Published: " + document.lastModified);
500 <div class="copyright">
502 2009-2011 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>