3 For more information on past and future Lucene versions, please see:
4 http://s.apache.org/luceneversions
6 ======================= Lucene 3.4.0 =======================
10 * LUCENE-3251: Directory#copy failed to close target output if opening the
11 source stream failed. (Simon Willnauer)
13 * LUCENE-3255: If segments_N file is all zeros (due to file
14 corruption), don't read that to mean the index is empty. (Gregory
15 Tarr, Mark Harwood, Simon Willnauer, Mike McCandless)
17 * LUCENE-3254: Fixed minor bug in deletes were written to disk,
18 causing the file to sometimes be larger than it needed to be. (Mike
21 * LUCENE-3224: Fixed a big where CheckIndex would incorrectly report a
22 corrupt index if a term with docfreq >= 16 was indexed more than once
23 at the same position. (Robert Muir)
25 * LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
26 suppressed exceptions in the original exception, so stack trace
27 will contain them. (Uwe Schindler)
29 * LUCENE-3339: Fixed deadlock case when multiple threads use the new
30 block-add (IndexWriter.add/updateDocuments) methods. (Robert Muir,
33 * LUCENE-3340: Fixed case where IndexWriter was not flushing at
34 exactly maxBufferedDeleteTerms (Mike McCandless)
36 * LUCENE-3358, LUCENE-3361: StandardTokenizer and UAX29URLEmailTokenizer
37 wrongly discarded combining marks attached to Han or Hiragana characters,
38 this is fixed if you supply Version >= 3.4 If you supply a previous
39 lucene version, you get the old buggy behavior for backwards compatibility.
40 (Trejkaz, Robert Muir)
42 * LUCENE-3368: IndexWriter commits segments without applying their buffered
43 deletes when flushing concurrently. (Simon Willnauer, Mike McCandless)
45 * LUCENE-3365: Create or Append mode determined before obtaining write lock
46 can cause IndexWriter overriding an existing index.
47 (Geoff Cooney via Simon Willnauer)
49 * LUCENE-3380: Fixed a bug where FileSwitchDirectory's listAll() would wrongly
50 throw NoSuchDirectoryException when all files written so far have been
51 written to one directory, but the other still has not yet been created on the
52 filesystem. (Robert Muir)
54 * LUCENE-3402: term vectors disappeared from the index if optimize() was called
55 following addIndexes(). (Shai Erera)
57 * LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT
58 SegmentReaders, leading to unused files accumulating in the
59 Directory. (tal steier via Mike McCandless)
61 * LUCENE-3390: Added SortField.setMissingValue(v) to enable well defined
62 sorting behavior for documents that do not include the given field.
63 (Gilad Barkai via Doron Cohen)
65 * LUCENE-3418: Lucene was failing to fsync index files on commit,
66 meaning an operating system or hardware crash, or power loss, could
67 easily corrupt the index. (Mark Miller, Robert Muir, Mike
72 * LUCENE-3290: Added FieldInvertState.numUniqueTerms
73 (Mike McCandless, Robert Muir)
75 * LUCENE-3280: Add FixedBitSet, like OpenBitSet but is not elastic
76 (grow on demand if you set/get/clear too-large indices). (Mike
79 * LUCENE-2048: Added the ability to omit positions but still index
80 term frequencies, you can now control what is indexed into
81 the postings via AbstractField.setIndexOptions:
82 DOCS_ONLY: only documents are indexed: term frequencies and positions are omitted
83 DOCS_AND_FREQS: only documents and term frequencies are indexed: positions are omitted
84 DOCS_AND_FREQS_AND_POSITIONS: full postings: documents, frequencies, and positions
85 AbstractField.setOmitTermFrequenciesAndPositions is deprecated,
86 you should use DOCS_ONLY instead. (Robert Muir)
88 * LUCENE-3097: Added a new grouping collector that can be used to retrieve all most relevant
89 documents per group. This can be useful in situations when one wants to compute grouping
90 based facets / statistics on the complete query result. (Martijn van Groningen)
94 * LUCENE-3289: When building an FST you can now tune how aggressively
95 the FST should try to share common suffixes. Typically you can
96 greatly reduce RAM required during building, and CPU consumed, at
97 the cost of a somewhat larger FST. (Mike McCandless)
101 * LUCENE-3327: Fix AIOOBE when TestFSTs is run with
102 -Dtests.verbose=true (James Dyer via Mike McCandless)
106 * LUCENE-3406: Add ant target 'package-local-src-tgz' to Lucene and Solr
107 to package sources from the local working copy.
108 (Seung-Yeoul Yang via Steve Rowe)
111 ======================= Lucene 3.3.0 =======================
113 Changes in backwards compatibility policy
115 * LUCENE-3140: IndexOutput.copyBytes now takes a DataInput (superclass
116 of IndexInput) as its first argument. (Robert Muir, Dawid Weiss,
119 * LUCENE-3191: FieldComparator.value now returns an Object not
120 Comparable; FieldDoc.fields also changed from Comparable[] to
121 Object[] (Uwe Schindler, Mike McCandless)
123 * LUCENE-3208: Made deprecated methods Query.weight(Searcher) and
124 Searcher.createWeight() final to prevent override. If you have
125 overridden one of these methods, cut over to the non-deprecated
126 implementation. (Uwe Schindler, Robert Muir, Yonik Seeley)
128 * LUCENE-3238: Made MultiTermQuery.rewrite() final, to prevent
129 problems (such as not properly setting rewrite methods, or
130 not working correctly with things like SpanMultiTermQueryWrapper).
131 To rewrite to a simpler form, instead return a simpler enum
132 from getEnum(IndexReader). For example, to rewrite to a single term,
133 return a SingleTermEnum. (ludovic Boutros, Uwe Schindler, Robert Muir)
135 Changes in runtime behavior
137 * LUCENE-2834: the hash used to compute the lock file name when the
138 lock file is not stored in the index has changed. This means you
139 will see a different lucene-XXX-write.lock in your lock directory.
140 (Robert Muir, Uwe Schindler, Mike McCandless)
142 * LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
143 does not store norms. (Shai Erera, Mike McCandless)
145 * LUCENE-3198: On Linux, if the JRE is 64 bit and supports unmapping,
146 FSDirectory.open now defaults to MMapDirectory instead of
147 NIOFSDirectory since MMapDirectory gives better performance. (Mike
150 * LUCENE-3200: MMapDirectory now uses chunk sizes that are powers of 2.
151 When setting the chunk size, it is rounded down to the next possible
152 value. The new default value for 64 bit platforms is 2^30 (1 GiB),
153 for 32 bit platforms it stays unchanged at 2^28 (256 MiB).
154 Internally, MMapDirectory now only uses one dedicated final IndexInput
155 implementation supporting multiple chunks, which makes Hotspot's life
156 easier. (Uwe Schindler, Robert Muir, Mike McCandless)
160 * LUCENE-3147,LUCENE-3152: Fixed open file handles leaks in many places in the
161 code. Now MockDirectoryWrapper (in test-framework) tracks all open files,
162 including locks, and fails if the test fails to release all of them.
163 (Mike McCandless, Robert Muir, Shai Erera, Simon Willnauer)
165 * LUCENE-3102: CachingCollector.replay was failing to call setScorer
166 per-segment (Martijn van Groningen via Mike McCandless)
168 * LUCENE-3183: Fix rare corner case where seeking to empty term
169 (field="", term="") with terms index interval 1 could hit
170 ArrayIndexOutOfBoundsException (selckin, Robert Muir, Mike
173 * LUCENE-3208: IndexSearcher had its own private similarity field
174 and corresponding get/setter overriding Searcher's implementation. If you
175 setted a different Similarity instance on IndexSearcher, methods implemented
176 in the superclass Searcher were not using it, leading to strange bugs.
177 (Uwe Schindler, Robert Muir)
179 * LUCENE-3197: Fix core merge policies to not over-merge during
180 background optimize when documents are still being deleted
181 concurrently with the optimize (Mike McCandless)
183 * LUCENE-3222: The RAM accounting for buffered delete terms was
184 failing to measure the space required to hold the term's field and
185 text character data. (Mike McCandless)
187 * LUCENE-3238: Fixed bug where using WildcardQuery("prefix*") inside
188 of a SpanMultiTermQueryWrapper rewrote incorrectly and returned
189 an error instead. (ludovic Boutros, Uwe Schindler, Robert Muir)
193 * LUCENE-3208: Renamed protected IndexSearcher.createWeight() to expert
194 public method IndexSearcher.createNormalizedWeight() as this better describes
195 what this method does. The old method is still there for backwards
196 compatibility. Query.weight() was deprecated and simply delegates to
197 IndexSearcher. Both deprecated methods will be removed in Lucene 4.0.
198 (Uwe Schindler, Robert Muir, Yonik Seeley)
200 * LUCENE-3197: MergePolicy.findMergesForOptimize now takes
201 Map<SegmentInfo,Boolean> instead of Set<SegmentInfo> as the second
202 argument, so the merge policy knows which segments were originally
203 present vs produced by an optimizing merge (Mike McCandless)
207 * LUCENE-1736: DateTools.java general improvements.
208 (David Smiley via Steve Rowe)
212 * LUCENE-3140: Added experimental FST implementation to Lucene.
213 (Robert Muir, Dawid Weiss, Mike McCandless)
215 * LUCENE-3193: A new TwoPhaseCommitTool allows running a 2-phase commit
216 algorithm over objects that implement the new TwoPhaseCommit interface (such
217 as IndexWriter). (Shai Erera)
219 * LUCENE-3191: Added TopDocs.merge, to facilitate merging results from
220 different shards (Uwe Schindler, Mike McCandless)
222 * LUCENE-3179: Added OpenBitSet.prevSetBit (Paul Elschot via Mike McCandless)
224 * LUCENE-3210: Made TieredMergePolicy more aggressive in reclaiming
225 segments with deletions; added new methods
226 set/getReclaimDeletesWeight to control this. (Mike McCandless)
230 * LUCENE-1344: Create OSGi bundle using dev-tools/maven.
231 (Nicolas Lalevée, Luca Stancapiano via ryan)
233 * LUCENE-3204: The maven-ant-tasks jar is now included in the source tree;
234 users of the generate-maven-artifacts target no longer have to manually
235 place this jar in the Ant classpath. NOTE: when Ant looks for the
236 maven-ant-tasks jar, it looks first in its pre-existing classpath, so
237 any copies it finds will be used instead of the copy included in the
238 Lucene/Solr source tree. For this reason, it is recommeded to remove
239 any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under
240 ~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe)
243 ======================= Lucene 3.2.0 =======================
245 Changes in backwards compatibility policy
247 * LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing
248 with generics can lead to ClassCastException. For advanced use (e.g. in Solr)
249 a method getHeapArray() was added to retrieve the internal heap array as a
250 non-generic Object[]. (Uwe Schindler, Yonik Seeley)
252 * LUCENE-1076: IndexWriter.setInfoStream now throws IOException
253 (Mike McCandless, Shai Erera)
255 * LUCENE-3084: MergePolicy.OneMerge.segments was changed from
256 SegmentInfos to a List<SegmentInfo>. SegmentInfos itsself was changed
257 to no longer extend Vector<SegmentInfo> (to update code that is using
258 Vector-API, use the new asList() and asSet() methods returning unmodifiable
259 collections; modifying SegmentInfos is now only possible through
260 the explicitely declared methods). IndexWriter.segString() now takes
261 Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile
262 should fix this. MergePolicy and SegmentInfos are internal/experimental
263 APIs not covered by the strict backwards compatibility policy.
264 (Uwe Schindler, Mike McCandless)
266 Changes in runtime behavior
268 * LUCENE-3065: When a NumericField is retrieved from a Document loaded
269 from IndexReader (or IndexSearcher), it will now come back as
270 NumericField not as a Field with a string-ified version of the
271 numeric value you had indexed. Note that this only applies for
272 newly-indexed Documents; older indices will still return Field
273 with the string-ified numeric value. If you call Document.get(),
274 the value comes still back as String, but Document.getFieldable()
275 returns NumericField instances. (Uwe Schindler, Ryan McKinley,
278 * LUCENE-1076: Changed the default merge policy from
279 LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32
280 (passed to IndexWriterConfig), which is able to merge non-contiguous
281 segments. This means docIDs no longer necessarily stay "in order"
282 during indexing. If this is a problem then you can use either of
283 the LogMergePolicy impls. (Mike McCandless)
287 * LUCENE-3082: Added index upgrade tool oal.index.IndexUpgrader
288 that allows to upgrade all segments to last recent supported index
289 format without fully optimizing. (Uwe Schindler, Mike McCandless)
291 * LUCENE-1076: Added TieredMergePolicy which is able to merge non-contiguous
292 segments, which means docIDs no longer necessarily stay "in order".
293 (Mike McCandless, Shai Erera)
295 * LUCENE-3071: Adding ReversePathHierarchyTokenizer, added skip parameter to
296 PathHierarchyTokenizer (Olivier Favre via ryan)
298 * LUCENE-1421, LUCENE-3102: added CachingCollector which allow you to cache
299 document IDs and scores encountered during the search, and "replay" them to
300 another Collector. (Mike McCandless, Shai Erera)
302 * LUCENE-3112: Added experimental IndexWriter.add/updateDocuments,
303 enabling a block of documents to be indexed, atomically, with
304 guaranteed sequential docIDs. (Mike McCandless)
308 * LUCENE-3061: IndexWriter's getNextMerge() and merge(OneMerge) are now public
309 (though @lucene.experimental), allowing for custom MergeScheduler
310 implementations. (Shai Erera)
312 * LUCENE-3065: Document.getField() was deprecated, as it throws
313 ClassCastException when loading lazy fields or NumericFields.
314 (Uwe Schindler, Ryan McKinley, Mike McCandless)
316 * LUCENE-2027: Directory.touchFile is deprecated and will be removed
317 in 4.0. (Mike McCandless)
321 * LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early
322 on empty or one-element lists/arrays. (Uwe Schindler)
324 * LUCENE-2897: Apply deleted terms while flushing a segment. We still
325 buffer deleted terms to later apply to past segments. (Mike McCandless)
327 * LUCENE-3126: IndexWriter.addIndexes copies incoming segments into CFS if they
328 aren't already and MergePolicy allows that. (Shai Erera)
332 * LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new
333 indexes, causing existing deletions to be applied on the incoming indexes as
334 well. (Shai Erera, Mike McCandless)
336 * LUCENE-3024: Index with more than 2.1B terms was hitting AIOOBE when
337 seeking TermEnum (eg used by Solr's faceting) (Tom Burton-West, Mike
340 * LUCENE-3042: When a filter or consumer added Attributes to a TokenStream
341 chain after it was already (partly) consumed [or clearAttributes(),
342 captureState(), cloneAttributes(),... was called by the Tokenizer],
343 the Tokenizer calling clearAttributes() or capturing state after addition
344 may not do this on the newly added Attribute. This bug affected only
345 very special use cases of the TokenStream-API, most users would not
346 have recognized it. (Uwe Schindler, Robert Muir)
348 * LUCENE-3054: PhraseQuery can in some cases stack overflow in
349 SorterTemplate.quickSort(). This fix also adds an optimization to
350 PhraseQuery as term with lower doc freq will also have less positions.
351 (Uwe Schindler, Robert Muir, Otis Gospodnetic)
353 * LUCENE-3068: sloppy phrase query failed to match valid documents when multiple
354 query terms had same position in the query. (Doron Cohen)
356 * LUCENE-3012: Lucene writes the header now for separate norm files (*.sNNN)
361 * LUCENE-3006: Building javadocs will fail on warnings by default.
362 Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
364 * LUCENE-3128: "ant eclipse" creates a .project file for easier Eclipse
365 integration (unless one already exists). (Daniel Serodio via Shai Erera)
369 * LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to
370 stop iterating if at least 'tests.iter.min' ran and a failure occured.
371 (Shai Erera, Chris Hostetter)
373 ======================= Lucene 3.1.0 =======================
375 Changes in backwards compatibility policy
377 * LUCENE-2719: Changed API of internal utility class
378 org.apache.lucene.util.SorterTemplate to support faster quickSort using
379 pivot values and also merge sort and insertion sort. If you have used
380 this class, you have to implement two more methods for handling pivots.
381 (Uwe Schindler, Robert Muir, Mike McCandless)
383 * LUCENE-1923: Renamed SegmentInfo & SegmentInfos segString method to
384 toString. These are advanced APIs and subject to change suddenly.
385 (Tim Smith via Mike McCandless)
387 * LUCENE-2190: Removed deprecated customScore() and customExplain()
388 methods from experimental CustomScoreQuery. (Uwe Schindler)
390 * LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
391 This means that terms with a position increment gap of zero do not
392 affect the norms calculation by default. (Robert Muir)
394 * LUCENE-2320: MergePolicy.writer is now of type SetOnce, which allows setting
395 the IndexWriter for a MergePolicy exactly once. You can change references to
396 'writer' from <code>writer.doXYZ()</code> to <code>writer.get().doXYZ()</code>
397 (it is also advisable to add an <code>assert writer != null;</code> before you
398 access the wrapped IndexWriter.)
400 In addition, MergePolicy only exposes a default constructor, and the one that
401 took IndexWriter as argument has been removed from all MergePolicy extensions.
402 (Shai Erera via Mike McCandless)
404 * LUCENE-2328: SimpleFSDirectory.SimpleFSIndexInput is moved to
405 FSDirectory.FSIndexInput. Anyone extending this class will have to
406 fix their code on upgrading. (Earwin Burrfoot via Mike McCandless)
408 * LUCENE-2302: The new interface for term attributes, CharTermAttribute,
409 now implements CharSequence. This requires the toString() methods of
410 CharTermAttribute, deprecated TermAttribute, and Token to return only
411 the term text and no other attribute contents. LUCENE-2374 implements
412 an attribute reflection API to no longer rely on toString() for attribute
413 inspection. (Uwe Schindler, Robert Muir)
415 * LUCENE-2372, LUCENE-2389: StandardAnalyzer, KeywordAnalyzer,
416 PerFieldAnalyzerWrapper, WhitespaceTokenizer are now final. Also removed
417 the now obsolete and deprecated Analyzer.setOverridesTokenStreamMethod().
418 Analyzer and TokenStream base classes now have an assertion in their ctor,
419 that check subclasses to be final or at least have final implementations
420 of incrementToken(), tokenStream(), and reusableTokenStream().
421 (Uwe Schindler, Robert Muir)
423 * LUCENE-2316: Directory.fileLength contract was clarified - it returns the
424 actual file's length if the file exists, and throws FileNotFoundException
425 otherwise. Returning length=0 for a non-existent file is no longer allowed. If
426 you relied on that, make sure to catch the exception. (Shai Erera)
428 * LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
429 creation. Previously, if you passed an empty Directory and set OpenMode to
430 CREATE*, IndexWriter would make a first empty commit. If you need that
431 behavior you can call writer.commit()/close() immediately after you create it.
432 (Shai Erera, Mike McCandless)
434 * LUCENE-2733: Removed public constructors of utility classes with only static
435 methods to prevent instantiation. (Uwe Schindler)
437 * LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
438 takes deletions into account by default. You can disable this by
439 calling setCalibrateSizeByDeletes(false) on the merge policy. (Mike
442 * LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
443 values in multi-valued field has been changed for some cases in index.
444 If you index empty fields and uses positions/offsets information on that
445 fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
447 * LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
448 (Shai Erera, Robert Muir)
450 * LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
451 Searchable are collapsed into IndexSearcher; contrib/remote and
452 MultiSearcher have been removed. (Mike McCandless)
454 * LUCENE-2854: Deprecated SimilarityDelegator and
455 Similarity.lengthNorm; the latter is now final, forcing any custom
456 Similarity impls to cutover to the more general computeNorm (Robert
457 Muir, Mike McCandless)
459 * LUCENE-2869: Deprecated Query.getSimilarity: instead of using
460 "runtime" subclassing/delegation, subclass the Weight instead.
463 * LUCENE-2674: A new idfExplain method was added to Similarity, that
464 accepts an incoming docFreq. If you subclass Similarity, make sure
465 you also override this method on upgrade. (Robert Muir, Mike
468 Changes in runtime behavior
470 * LUCENE-1923: Made IndexReader.toString() produce something
471 meaningful (Tim Smith via Mike McCandless)
473 * LUCENE-2179: CharArraySet.clear() is now functional.
474 (Robert Muir, Uwe Schindler)
476 * LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index
477 before it adds the new ones. Also, the existing segments are not merged and so
478 the index will not end up with a single segment (unless it was empty before).
479 In addition, addIndexesNoOptimize was renamed to addIndexes and no longer
480 invokes a merge on the incoming and target segments, but instead copies the
481 segments to the target index. You can call maybeMerge or optimize after this
482 method completes, if you need to.
484 In addition, Directory.copyTo* were removed in favor of copy which takes the
485 target Directory, source and target files as arguments, and copies the source
486 file to the target Directory under the target file name. (Shai Erera)
488 * LUCENE-2663: IndexWriter no longer forcefully clears any existing
489 locks when create=true. This was a holdover from when
490 SimpleFSLockFactory was the default locking implementation, and,
491 even then it was dangerous since it could mask bugs in IndexWriter's
492 usage, allowing applications to accidentally open two writers on the
493 same directory. (Mike McCandless)
495 * LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on
496 LogMergePolicy now affect optimize() as well (as opposed to only regular
497 merges). This means that you can run optimize() and too large segments won't
498 be merged. (Shai Erera)
500 * LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
501 guaranteeing the commits are sorted from oldest to latest. (Shai Erera)
503 * LUCENE-2785: TopScoreDocCollector, TopFieldCollector and
504 the IndexSearcher search methods that take an int nDocs will now
505 throw IllegalArgumentException if nDocs is 0. Instead, you should
506 use the newly added TotalHitCountCollector. (Mike McCandless)
508 * LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
509 to determine whether the passed in segment should be compound.
510 (Shai Erera, Earwin Burrfoot)
512 * LUCENE-2805: IndexWriter now increments the index version on every change to
513 the index instead of for every commit. Committing or closing the IndexWriter
514 without any changes to the index will not cause any index version increment.
515 (Simon Willnauer, Mike McCandless)
517 * LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
518 Windows and Solaris systems that support unmapping, FSDirectory.open returns
519 MMapDirectory. Additionally the behavior of MMapDirectory has been
520 changed to enable unmapping by default if supported by the JRE.
521 (Mike McCandless, Uwe Schindler, Robert Muir)
523 * LUCENE-2829: Improve the performance of "primary key" lookup use
524 case (running a TermQuery that matches one document) on a
525 multi-segment index. (Robert Muir, Mike McCandless)
527 * LUCENE-2010: Segments with 100% deleted documents are now removed on
528 IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
530 * LUCENE-2960: Allow some changes to IndexWriterConfig to take effect
531 "live" (after an IW is instantiated), via
532 IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless)
536 * LUCENE-2076: Rename FSDirectory.getFile -> getDirectory. (George
537 Aroush via Mike McCandless)
539 * LUCENE-1260: Change norm encode (float->byte) and decode
540 (byte->float) to be instance methods not static methods. This way a
541 custom Similarity can alter how norms are encoded, though they must
542 still be encoded as a single byte (Johan Kindgren via Mike
545 * LUCENE-2103: NoLockFactory should have a private constructor;
546 until Lucene 4.0 the default one will be deprecated.
547 (Shai Erera via Uwe Schindler)
549 * LUCENE-2177: Deprecate the Field ctors that take byte[] and Store.
550 Since the removal of compressed fields, Store can only be YES, so
551 it's not necessary to specify. (Erik Hatcher via Mike McCandless)
553 * LUCENE-2200: Several final classes had non-overriding protected
554 members. These were converted to private and unused protected
555 constructors removed. (Steven Rowe via Robert Muir)
557 * LUCENE-2240: SimpleAnalyzer and WhitespaceAnalyzer now have
558 Version ctors. (Simon Willnauer via Uwe Schindler)
560 * LUCENE-2259: Add IndexWriter.deleteUnusedFiles, to attempt removing
561 unused files. This is only useful on Windows, which prevents
562 deletion of open files. IndexWriter will eventually remove these
563 files itself; this method just lets you do so when you know the
564 files are no longer open by IndexReaders. (luocanrao via Mike
567 * LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier
568 use by external code. In addition it offers a matchExtension method which
569 callers can use to query whether a certain file matches a certain extension.
570 (Shai Erera via Mike McCandless)
572 * LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery.
573 This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but
574 only scores terms by their boost values. For example, this can be used
575 with FuzzyQuery to ensure that exact matches are always scored higher,
576 because only the boost will be used in scoring. (Robert Muir)
578 * LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to
579 expose its folding logic. (CĂ©drik Lime via Robert Muir)
581 * LUCENE-2294: IndexWriter constructors have been deprecated in favor of a
582 single ctor which accepts IndexWriterConfig and a Directory. You can set all
583 the parameters related to IndexWriter on IndexWriterConfig. The different
584 setter/getter methods were deprecated as well. One should call
585 writer.getConfig().getXYZ() to query for a parameter XYZ.
586 Additionally, the setter/getter related to MergePolicy were deprecated as
587 well. One should interact with the MergePolicy directly.
588 (Shai Erera via Mike McCandless)
590 * LUCENE-2320: IndexWriter's MergePolicy configuration was moved to
591 IndexWriterConfig and the respective methods on IndexWriter were deprecated.
592 (Shai Erera via Mike McCandless)
594 * LUCENE-2328: Directory now keeps track itself of the files that are written
595 but not yet fsynced. The old Directory.sync(String file) method is deprecated
596 and replaced with Directory.sync(Collection<String> files). Take a look at
597 FSDirectory to see a sample of how such tracking might look like, if needed
598 in your custom Directories. (Earwin Burrfoot via Mike McCandless)
600 * LUCENE-2302: Deprecated TermAttribute and replaced by a new
601 CharTermAttribute. The change is backwards compatible, so
602 mixed new/old TokenStreams all work on the same char[] buffer
603 independent of which interface they use. CharTermAttribute
604 has shorter method names and implements CharSequence and
605 Appendable. This allows usage like Java's StringBuilder in
606 addition to direct char[] access. Also terms can directly be
607 used in places where CharSequence is allowed (e.g. regular
609 (Uwe Schindler, Robert Muir)
611 * LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
612 points too. If you use an IndexDeletionPolicy which holds onto index commits
613 (such as SnapshotDeletionPolicy), you can call this method to remove those
614 commit points when they are not needed anymore (instead of waiting for the
615 next commit). (Shai Erera)
617 * LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
618 with equivalent ones that take a String (id) as argument. You can pass
619 whatever ID you want, as long as you use the same one when calling both.
622 * LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
623 set what IndexWriter passes for termsIndexDivisor to the readers it
624 opens internally when apply deletions or creating a near-real-time
625 reader. (Earwin Burrfoot via Mike McCandless)
627 * LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer
628 in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
629 Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
630 points, including values from U+FFFF to U+10FFFF
632 ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
633 Analyzer implementation and behavior. Only the Unicode Basic Multilingual
634 Plane (code points from U+0000 to U+FFFF) is covered.
636 UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
637 relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
638 (Steven Rowe, Robert Muir, Uwe Schindler)
640 * LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
641 and return a different RAMFile implementation. (Shai Erera)
643 * LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
644 count the number of hits matching the query. (Mike McCandless)
646 * LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method
647 is only syntactic sugar for setNorm(int, String, byte), but using the global
648 Similarity.getDefault().encodeNormValue(). Use the byte-based method instead
649 to ensure that the norm is encoded with your Similarity.
650 (Robert Muir, Mike McCandless)
652 * LUCENE-2374: Added Attribute reflection API: It's now possible to inspect the
653 contents of AttributeImpl and AttributeSource using a well-defined API.
654 This is e.g. used by Solr's AnalysisRequestHandlers to display all attributes
656 There are also some backwards incompatible changes in toString() output,
657 as LUCENE-2302 introduced the CharSequence interface to CharTermAttribute
658 leading to changed toString() return values. The new API allows to get a
659 string representation in a well-defined way using a new method
660 reflectAsString(). For backwards compatibility reasons, when toString()
661 was implemented by implementation subclasses, the default implementation of
662 AttributeImpl.reflectWith() uses toString()s output instead to report the
663 Attribute's properties. Otherwise, reflectWith() uses Java's reflection
664 (like toString() did before) to get the attribute properties.
665 In addition, the mandatory equals() and hashCode() are no longer required
666 for AttributeImpls, but can still be provided (if needed).
669 * LUCENE-2691: Deprecate IndexWriter.getReader in favor of
670 IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless)
672 * LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity,
673 it should keep it itself. Fixed Scorers to pass their parent Weight, so that
674 Scorer.visitSubScorers (LUCENE-2590) will work correctly.
675 (Robert Muir, Doron Cohen)
677 * LUCENE-2900: When opening a near-real-time (NRT) reader
678 (IndexReader.re/open(IndexWriter)) you can now specify whether
679 deletes should be applied. Applying deletes can be costly, and some
680 expert use cases can handle seeing deleted documents returned. The
681 deletes remain buffered so that the next time you open an NRT reader
682 and pass true, all deletes will be a applied. (Mike McCandless)
684 * LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now
685 require up front specification of enablePositionIncrement. Together with
686 StopFilter they have a common base class (FilteringTokenFilter) that handles
687 the position increments automatically. Implementors only need to override an
688 accept() method that filters tokens. (Uwe Schindler, Robert Muir)
692 * LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
693 close. (Martin Traverso via Uwe Schindler)
695 * LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap
696 incorrectly and lead to ConcurrentModificationException.
697 (Uwe Schindler, Robert Muir)
699 * LUCENE-2328: Index files fsync tracking moved from
700 IndexWriter/IndexReader to Directory, and it no longer leaks memory.
701 (Earwin Burrfoot via Mike McCandless)
703 * LUCENE-2074: Reduce buffer size of lexer back to default on reset.
704 (Ruben Laguna, Shai Erera via Uwe Schindler)
706 * LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on
707 a prior (corrupt) index missing its segments_N file. (Mike
710 * LUCENE-2458: QueryParser no longer automatically forms phrase queries,
711 assuming whitespace tokenization. Previously all CJK queries, for example,
712 would be turned into phrase queries. The old behavior is preserved with
713 the matchVersion parameter for previous versions. Additionally, you can
714 explicitly enable the old behavior with setAutoGeneratePhraseQueries(true)
717 * LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in
718 OOM if a large file was copied. (Shai Erera)
720 * LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions
721 exceeds number of terms at one position (Jayendra Patil via Mike McCandless)
723 * LUCENE-2617: Optional clauses of a BooleanQuery were not factored
724 into coord if the scorer for that segment returned null. This
725 can cause the same document to score to differently depending on
726 what segment it resides in. (yonik)
728 * LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
730 * LUCENE-2732: Fix charset problems in XML loading in
731 HyphenationCompoundWordTokenFilter. (Uwe Schindler)
733 * LUCENE-2802: NRT DirectoryReader returned incorrect values from
734 getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
735 to a mutable reference to the IndexWriters SegmentInfos.
736 (Simon Willnauer, Earwin Burrfoot)
738 * LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
739 false EOF after seeking to EOF then seeking back to same block you
740 were just in and then calling readBytes (Robert Muir, Mike McCandless)
742 * LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
743 decides whether to return the cached computed size or not. (Shai Erera)
745 * LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if
746 called by multiple threads. (Alexander Kanarsky via Shai Erera)
748 * LUCENE-2809: Fixed IndexWriter.numDocs to take into account
749 applied but not yet flushed deletes. (Mike McCandless)
751 * LUCENE-2879: MultiPhraseQuery previously calculated its phrase IDF by summing
752 internally, it now calls Similarity.idfExplain(Collection, IndexSearcher).
755 * LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed.
756 (Jason Rutherglen via Shai Erera)
758 * LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round()
759 is safe also in strange locales. (Uwe Schindler)
761 * LUCENE-2891: IndexWriterConfig did not accept -1 in setReaderTermIndexDivisor,
762 which can be used to prevent loading the terms index into memory. (Shai Erera)
764 * LUCENE-2937: Encoding a float into a byte (e.g. encoding field norms during
765 indexing) had an underflow detection bug that caused floatToByte(f)==0 where
766 f was greater than 0, but slightly less than byteToFloat(1). This meant that
767 certain very small field norms (index_boost * length_norm) could have
768 been rounded down to 0 instead of being rounded up to the smallest
769 positive number. (yonik)
771 * LUCENE-2936: PhraseQuery score explanations were not correctly
772 identifying matches vs non-matches. (hossman)
774 * LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if
775 the underlying readByte() is inlined (which happens e.g. in MMapDirectory).
776 The loop was unwinded which makes the hotspot bug disappear.
777 (Uwe Schindler, Robert Muir, Mike McCandless)
781 * LUCENE-2128: Parallelized fetching document frequencies during weight
782 creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler)
784 * LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch
785 to Java 5, supplementary characters are now lowercased correctly if the
786 set is created as case insensitive.
787 CharArraySet now requires a Version argument to preserve
788 backwards compatibility. If Version < 3.1 is passed to the constructor,
789 CharArraySet yields the old behavior. (Simon Willnauer)
791 * LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch
792 to Java 5, supplementary characters are now lowercased correctly.
793 LowerCaseFilter now requires a Version argument to preserve
794 backwards compatibility. If Version < 3.1 is passed to the constructor,
795 LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir)
797 * LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer
798 that makes it easier to reuse TokenStreams correctly. This issue also added
799 StopwordAnalyzerBase, which improves consistency of all Analyzers that use
800 stopwords, and implement many analyzers in contrib with it.
801 (Simon Willnauer via Robert Muir)
803 * LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a
804 new KeywordAttribute. (Simon Willnauer, Drew Farris via Uwe Schindler)
806 * LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support
807 to CharTokenizer and its subclasses. CharTokenizer now has new
808 int-API which is conditionally preferred to the old char-API depending
809 on the provided Version. Version < 3.1 will use the char-API.
810 (Simon Willnauer via Uwe Schindler)
812 * LUCENE-2247: Added a CharArrayMap<V> for performance improvements
813 in some stemmers and synonym filters. (Uwe Schindler)
815 * LUCENE-2320: Added SetOnce which wraps an object and allows it to be set
816 exactly once. (Shai Erera via Mike McCandless)
818 * LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that
819 allows to use cloneAttributes() and this method as a replacement
820 for captureState()/restoreState(), if the state itself
821 needs to be inspected/modified. (Uwe Schindler)
823 * LUCENE-2293: Expose control over max number of threads that
824 IndexWriter will allow to run concurrently while indexing
825 documents (previously this was hardwired to 5), using
826 IndexWriterConfig.setMaxThreadStates. (Mike McCandless)
828 * LUCENE-2297: Enable turning on reader pooling inside IndexWriter
829 even when getReader (near-real-timer reader) is not in use, through
830 IndexWriterConfig.enable/disableReaderPooling. (Mike McCandless)
832 * LUCENE-2331: Add NoMergePolicy which never returns any merges to execute. In
833 addition, add NoMergeScheduler which never executes any merges. These two are
834 convenient classes in case you want to disable segment merges by IndexWriter
835 without tweaking a particular MergePolicy parameters, such as mergeFactor.
836 MergeScheduler's methods are now public. (Shai Erera via Mike McCandless)
838 * LUCENE-2339: Deprecate static method Directory.copy in favor of
839 Directory.copyTo, and use nio's FileChannel.transferTo when copying
840 files between FSDirectory instances. (Earwin Burrfoot via Mike
843 * LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
844 matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)
846 * LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
847 can be used to prevent commits from ever getting deleted from the index.
850 * LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
851 return a DirPayloadProcessor for a given Directory, which returns a
852 PayloadProcessor for a given Term. The PayloadProcessor will be used to
853 process the payloads of the segments as they are merged (e.g. if one wants to
854 rewrite payloads of external indexes as they are added, or of local ones).
855 (Shai Erera, Michael Busch, Mike McCandless)
857 * LUCENE-2440: Add support for custom ExecutorService in
858 ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
860 * LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter
861 to wrap any other Analyzer and provide the same functionality as
862 MaxFieldLength provided on IndexWriter. This patch also fixes a bug
863 in the offset calculation in CharTokenizer. (Uwe Schindler, Shai Erera)
865 * LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
866 it's empty. (Ross Woolf via Mike McCandless)
868 * LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
871 * LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq. Along
872 with a custom Collector these experimental methods make it possible
873 to gather the hit-count per sub-clause and per document while a
874 search is running. (Simon Willnauer, Mike McCandless)
876 * LUCENE-2636: Added MultiCollector which allows running the search with several
877 Collectors. (Shai Erera)
879 * LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries
880 to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
881 Using this wrapper its easy to add fuzzy/wildcard to e.g. a SpanNearQuery.
882 (Robert Muir, Uwe Schindler)
884 * LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query
885 instance for stripping off scores. The use of a QueryWrapperFilter
886 is no longer needed and discouraged for that use case. Directly wrapping
887 Query improves performance, as out-of-order collection is now supported.
890 * LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to
891 FieldInvertState so that it can be used in Similarity.computeNorm.
894 * LUCENE-2720: Segments now record the code version which created them.
895 (Shai Erera, Mike McCandless, Uwe Schindler)
897 * LUCENE-2474: Added expert ReaderFinishedListener API to
898 IndexReader, to allow apps that maintain external per-segment caches
899 to evict entries when a segment is finished. (Shay Banon, Yonik
900 Seeley, Mike McCandless)
902 * LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and
903 the ICUTokenizer in contrib now all tag types with a consistent set
904 of token types (defined in StandardTokenizer). Tokens in the major
905 CJK types are explicitly marked to allow for custom downstream handling:
906 <IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>.
907 (Robert Muir, Steven Rowe)
909 * LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler)
911 * LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields
912 (Tim Smith, Grant Ingersoll)
914 * LUCENE-2692: Added several new SpanQuery classes for positional checking
915 (match is in a range, payload is a specific value) (Grant Ingersoll)
919 * LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
920 simple polling for results. (Edward Drapkin, Simon Willnauer)
922 * LUCENE-2075: Terms dict cache is now shared across threads instead
923 of being stored separately in thread local storage. Also fixed
924 terms dict so that the cache is used when seeking the thread local
925 term enum, which will be important for MultiTermQuery impls that do
926 lots of seeking (Mike McCandless, Uwe Schindler, Robert Muir, Yonik
929 * LUCENE-2136: If the multi reader (DirectoryReader or MultiReader)
930 only has a single sub-reader, delegate all enum requests to it.
931 This avoid the overhead of using a PQ unnecessarily. (Mike
934 * LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin
935 Burrfoot via Mike McCandless)
937 * LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode
938 into MultiTermQuery. The number of fuzzy expansions can be specified with
939 the maxExpansions parameter to FuzzyQuery.
940 (Uwe Schindler, Robert Muir, Mike McCandless)
942 * LUCENE-2164: ConcurrentMergeScheduler has more control over merge
943 threads. First, it gives smaller merges higher thread priority than
944 larges ones. Second, a new set/getMaxMergeCount setting will pause
945 the larger merges to allow smaller ones to finish. The defaults for
946 these settings are now dynamic, depending the number CPU cores as
947 reported by Runtime.getRuntime().availableProcessors() (Mike
950 * LUCENE-2169: Improved CharArraySet.copy(), if source set is
951 also a CharArraySet. (Simon Willnauer via Uwe Schindler)
953 * LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
954 directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to
955 take advantage of this for faster performance.
956 (Steven Rowe, Uwe Schindler, Robert Muir)
958 * LUCENE-2188: Add a utility class for tracking deprecated overridden
959 methods in non-final subclasses.
960 (Uwe Schindler, Robert Muir)
962 * LUCENE-2195: Speedup CharArraySet if set is empty.
963 (Simon Willnauer via Robert Muir)
965 * LUCENE-2285: Code cleanup. (Shai Erera via Uwe Schindler)
967 * LUCENE-2303: Remove code duplication in Token class by subclassing
968 TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve
969 null-handling for TypeAttribute. (Uwe Schindler)
971 * LUCENE-2329: Switch TermsHash* from using a PostingList object per unique
972 term to parallel arrays, indexed by termID. This reduces garbage collection
973 overhead significantly, which results in great indexing performance wins
974 when the available JVM heap space is low. This will become even more
975 important when the DocumentsWriter RAM buffer is searchable in the future,
976 because then it will make sense to make the RAM buffers as large as
977 possible. (Mike McCandless, Michael Busch)
979 * LUCENE-2380: The terms field cache methods (getTerms,
980 getTermsIndex), which replace the older String equivalents
981 (getStrings, getStringIndex), consume quite a bit less RAM in most
982 cases. (Mike McCandless)
984 * LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
987 * LUCENE-2531: Fix issue when sorting by a String field that was
988 causing too many fallbacks to compare-by-value (instead of by-ord).
991 * LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
992 efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
993 streams. (Shai Erera)
995 * LUCENE-2719: Improved TermsHashPerField's sorting to use a better
996 quick sort algorithm that dereferences the pivot element not on
997 every compare call. Also replaced lots of sorting code in Lucene
998 by the improved SorterTemplate class.
999 (Uwe Schindler, Robert Muir, Mike McCandless)
1001 * LUCENE-2760: Optimize SpanFirstQuery and SpanPositionRangeQuery.
1004 * LUCENE-2770: Make SegmentMerger always work on atomic subreaders,
1005 even when IndexWriter.addIndexes(IndexReader...) is used with
1006 DirectoryReaders or other MultiReaders. This saves lots of memory
1007 during merge of norms. (Uwe Schindler, Mike McCandless)
1009 * LUCENE-2824: Optimize BufferedIndexInput to do less bounds checks.
1012 * LUCENE-2010: Segments with 100% deleted documents are now removed on
1013 IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
1015 * LUCENE-1472: Removed synchronization from static DateTools methods
1016 by using a ThreadLocal. Also converted DateTools.Resolution to a
1017 Java 5 enum (this should not break backwards). (Uwe Schindler)
1021 * LUCENE-2124: Moved the JDK-based collation support from contrib/collation
1022 into core, and moved the ICU-based collation support into contrib/icu.
1025 * LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
1026 branch is now included in the svn repository using "svn copy"
1027 after release. (Uwe Schindler)
1029 * LUCENE-2074: Regenerating StandardTokenizerImpl files now needs
1030 JFlex 1.5 (currently only available on SVN). (Uwe Schindler)
1032 * LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
1033 can force them to run sequentially by passing -Drunsequential=1 on the command
1034 line. The number of threads that are spawned per CPU defaults to '1'. If you
1035 wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
1036 (Robert Muir, Shai Erera, Peter Kofler)
1038 * LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar
1039 from tarball of previous version. Backwards tests are now packaged together
1040 with src distribution. (Uwe Schindler)
1042 * LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration:
1043 "ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
1046 * LUCENE-2657: Switch from using Maven POM templates to full POMs when
1047 generating Maven artifacts (Steven Rowe)
1049 * LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's
1050 tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera,
1055 * LUCENE-2037 Allow Junit4 tests in our environment (Erick Erickson
1056 via Mike McCandless)
1058 * LUCENE-1844: Speed up the unit tests (Mark Miller, Erick Erickson,
1061 * LUCENE-2065: Use Java 5 generics throughout our unit tests. (Kay
1062 Kay via Mike McCandless)
1064 * LUCENE-2155: Fix time and zone dependent localization test failures
1065 in queryparser tests. (Uwe Schindler, Chris Male, Robert Muir)
1067 * LUCENE-2170: Fix thread starvation problems. (Uwe Schindler)
1069 * LUCENE-2248, LUCENE-2251, LUCENE-2285: Refactor tests to not use
1070 Version.LUCENE_CURRENT, but instead use a global static value
1071 from LuceneTestCase(J4), that contains the release version.
1072 (Uwe Schindler, Simon Willnauer, Shai Erera)
1074 * LUCENE-2313, LUCENE-2322: Add VERBOSE to LuceneTestCase(J4) to control
1075 verbosity of tests. If VERBOSE==false (default) tests should not print
1076 anything other than errors to System.(out|err). The setting can be
1077 changed with -Dtests.verbose=true on test invocation.
1078 (Shai Erera, Paul Elschot, Uwe Schindler)
1080 * LUCENE-2318: Remove inconsistent system property code for retrieving
1081 temp and data directories inside test cases. It is now centralized in
1082 LuceneTestCase(J4). Also changed lots of tests to use
1083 getClass().getResourceAsStream() to retrieve test data. Tests needing
1084 access to "real" files from the test folder itself, can use
1085 LuceneTestCase(J4).getDataFile(). (Uwe Schindler)
1087 * LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such
1088 as Eclipse and IntelliJ.
1089 (Paolo Castagna, Steven Rowe via Robert Muir)
1091 * LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
1092 random. (Shai Erera, Robert Muir)
1096 * LUCENE-2579: Fix oal.search's package.html description of abstract
1097 methods. (Santiago M. Mola via Mike McCandless)
1099 * LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
1100 that the TermEnum must be seeked since it is unpositioned.
1101 (Adriano Crestani via Robert Muir)
1103 * LUCENE-2894: Use google-code-prettify for syntax highlighting in javadoc.
1104 (Shinichiro Abe, Koji Sekiguchi)
1106 ================== Release 2.9.4 / 3.0.3 ====================
1108 Changes in runtime behavior
1110 * LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a
1111 test lock just before the real lock is acquired. (Surinder Pal
1112 Singh Bindra via Mike McCandless)
1114 * LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
1115 handles against deleted files when compound-file was enabled (the
1116 default) and readers are pooled. As a result of this the peak
1117 worst-case free disk space required during optimize is now 3X the
1118 index size, when compound file is enabled (else 2X). (Mike
1121 * LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
1122 0.1), which means any time a merged segment is greater than 10% of
1123 the index size, it will be left in non-compound format even if
1124 compound format is on. This change was made to reduce peak
1125 transient disk usage during optimize which increased due to
1126 LUCENE-2762. (Mike McCandless)
1130 * LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer
1131 throws an exception when term count exceeds doc count.
1132 (Mike McCandless, Uwe Schindler)
1134 * LUCENE-2513: when opening writable IndexReader on a not-current
1135 commit, do not overwrite "future" commits. (Mike McCandless)
1137 * LUCENE-2536: IndexWriter.rollback was failing to properly rollback
1138 buffered deletions against segments that were flushed (Mark Harwood
1139 via Mike McCandless)
1141 * LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results
1142 with endpoints near Long.MIN_VALUE and Long.MAX_VALUE:
1143 NumericUtils.splitRange() overflowed, if
1144 - the range contained a LOWER bound
1145 that was greater than (Long.MAX_VALUE - (1L << precisionStep))
1146 - the range contained an UPPER bound
1147 that was less than (Long.MIN_VALUE + (1L << precisionStep))
1148 With standard precision steps around 4, this had no effect on
1149 most queries, only those that met the above conditions.
1150 Queries with large precision steps failed more easy. Queries with
1151 precision step >=64 were not affected. Also 32 bit data types int
1152 and float were not affected.
1153 (Yonik Seeley, Uwe Schindler)
1155 * LUCENE-2593: Fixed certain rare cases where a disk full could lead
1156 to a corrupted index (Robert Muir, Mike McCandless)
1158 * LUCENE-2620: Fixed a bug in WildcardQuery where too many asterisks
1159 would result in unbearably slow performance. (Nick Barkas via Robert Muir)
1161 * LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an
1162 exact multiple of the chunk size. (Robert Muir)
1164 * LUCENE-2634: isCurrent on an NRT reader was failing to return false
1165 if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless)
1167 * LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing
1168 an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir)
1170 * LUCENE-2384: Reset zzBuffer in StandardTokenizerImpl when lexer is reset.
1171 (Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074)
1173 * LUCENE-2658: Exceptions while processing term vectors enabled for multiple
1174 fields could lead to invalid ArrayIndexOutOfBoundsExceptions.
1175 (Robert Muir, Mike McCandless)
1177 * LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap().
1178 (Javier Godoy via Uwe Schindler)
1180 * LUCENE-2328: Fixed memory leak in how IndexWriter/Reader tracked
1181 already sync'd files. (Earwin Burrfoot via Mike McCandless)
1183 * LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record
1184 the absolute docid. (Uwe Schindler)
1186 * LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when
1187 primary & secondary dirs share the same underlying directory.
1188 (Michael McCandless)
1190 * LUCENE-2365: IndexWriter.newestSegment (used normally for testing)
1191 is fixed to return null if there are no segments. (Karthick
1192 Sankarachary via Mike McCandless)
1194 * LUCENE-2730: Fix two rare deadlock cases in IndexWriter (Mike McCandless)
1196 * LUCENE-2744: CheckIndex was stating total number of fields,
1197 not the number that have norms enabled, on the "test: field
1198 norms..." output. (Mark Kristensson via Mike McCandless)
1200 * LUCENE-2759: Fixed two near-real-time cases where doc store files
1201 may be opened for read even though they are still open for write.
1204 * LUCENE-2618: Fix rare thread safety issue whereby
1205 IndexWriter.optimize could sometimes return even though the index
1206 wasn't fully optimized (Mike McCandless)
1208 * LUCENE-2767: Fix thread safety issue in addIndexes(IndexReader[])
1209 that could potentially result in index corruption. (Mike
1212 * LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
1213 handles against deleted files when compound-file was enabled (the
1214 default) and readers are pooled. As a result of this the peak
1215 worst-case free disk space required during optimize is now 3X the
1216 index size, when compound file is enabled (else 2X). (Mike
1219 * LUCENE-2216: OpenBitSet.hashCode returned different hash codes for
1220 sets that only differed by trailing zeros. (Dawid Weiss, yonik)
1222 * LUCENE-2782: Fix rare potential thread hazard with
1223 IndexWriter.commit (Mike McCandless)
1227 * LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
1228 0.1), which means any time a merged segment is greater than 10% of
1229 the index size, it will be left in non-compound format even if
1230 compound format is on. This change was made to reduce peak
1231 transient disk usage during optimize which increased due to
1232 LUCENE-2762. (Mike McCandless)
1236 * LUCENE-2556: Improve memory usage after cloning TermAttribute.
1237 (Adriano Crestani via Uwe Schindler)
1239 * LUCENE-2098: Improve the performance of BaseCharFilter, especially for
1240 large documents. (Robin Wojciki, Koji Sekiguchi, Robert Muir)
1244 * LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files
1245 also in 2.9. The file format did not change, only the version number was
1246 upgraded to mark segments that have no compression. FieldsWriter still only
1247 writes 2.9 segments as they could contain compressed fields. This cross-version
1248 index format compatibility is provided here solely because Lucene 2.9 and 3.0
1249 have the same bugfix level, features, and the same index format with this slight
1250 compression difference. In general, Lucene does not support reading newer
1251 indexes with older library versions. (Uwe Schindler)
1255 * LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to
1256 Java NIO behavior when a Thread is interrupted while blocking on IO.
1257 (Simon Willnauer, Robert Muir)
1259 ================== Release 2.9.3 / 3.0.2 ====================
1261 Changes in backwards compatibility policy
1263 * LUCENE-2135: Added FieldCache.purge(IndexReader) method to the
1264 interface. Anyone implementing FieldCache externally will need to
1265 fix their code to implement this, on upgrading. (Mike McCandless)
1267 Changes in runtime behavior
1269 * LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if
1270 it cannot delete the lock file, since obtaining the lock does not fail if the
1271 file is there. (Shai Erera)
1273 * LUCENE-2060 (2.9.3 only): Changed ConcurrentMergeScheduler's default for
1274 maxNumThreads from 3 to 1, because in practice we get the most gains
1275 from running a single merge in the backround. More than one
1276 concurrent merge causes alot of thrashing (though it's possible on
1277 SSD storage that there would be net gains). (Jason Rutherglen, Mike
1282 * LUCENE-2046 (2.9.3 only): IndexReader should not see the index as changed, after
1283 IndexWriter.prepareCommit has been called but before
1284 IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
1286 * LUCENE-2119: Don't throw NegativeArraySizeException if you pass
1287 Integer.MAX_VALUE as nDocs to IndexSearcher search methods. (Paul
1288 Taylor via Mike McCandless)
1290 * LUCENE-2142: FieldCacheImpl.getStringIndex no longer throws an
1291 exception when term count exceeds doc count. (Mike McCandless)
1293 * LUCENE-2104: NativeFSLock.release() would silently fail if the lock is held by
1294 another thread/process. (Shai Erera via Uwe Schindler)
1296 * LUCENE-2283: Use shared memory pool for term vector and stored
1297 fields buffers. This memory will be reclaimed if needed according to
1298 the configured RAM Buffer Size for the IndexWriter. This also fixes
1299 potentially excessive memory usage when many threads are indexing a
1300 mix of small and large documents. (Tim Smith via Mike McCandless)
1302 * LUCENE-2300: If IndexWriter is pooling reader (because NRT reader
1303 has been obtained), and addIndexes* is run, do not pool the
1304 readers from the external directory. This is harmless (NRT reader is
1305 correct), but a waste of resources. (Mike McCandless)
1307 * LUCENE-2422: Don't reuse byte[] in IndexInput/Output -- it gains
1308 little performance, and ties up possibly large amounts of memory
1309 for apps that index large docs. (Ross Woolf via Mike McCandless)
1311 * LUCENE-2387: Don't hang onto Fieldables from the last doc indexed,
1312 in IndexWriter, nor the Reader in Tokenizer after close is
1313 called. (Ruben Laguna, Uwe Schindler, Mike McCandless)
1315 * LUCENE-2417: IndexCommit did not implement hashCode() and equals()
1316 consistently. Now they both take Directory and version into consideration. In
1317 addition, all of IndexComnmit methods which threw
1318 UnsupportedOperationException are now abstract. (Shai Erera)
1320 * LUCENE-2467: Fixed memory leaks in IndexWriter when large documents
1321 are indexed. (Mike McCandless)
1323 * LUCENE-2473: Clicking on the "More Results" link in the luceneweb.war
1324 demo resulted in ArrayIndexOutOfBoundsException.
1325 (Sami Siren via Robert Muir)
1327 * LUCENE-2476: If any exception is hit init'ing IW, release the write
1328 lock (previously we only released on IOException). (Tamas Cservenak
1329 via Mike McCandless)
1331 * LUCENE-2478: Fix CachingWrapperFilter to not throw NPE when
1332 Filter.getDocIdSet() returns null. (Uwe Schindler, Daniel Noll)
1334 * LUCENE-2468: Allow specifying how new deletions should be handled in
1335 CachingWrapperFilter and CachingSpanFilter. By default, new
1336 deletions are ignored in CachingWrapperFilter, since typically this
1337 filter is AND'd with a query that correctly takes new deletions into
1338 account. This should be a performance gain (higher cache hit rate)
1339 in apps that reopen readers, or use near-real-time reader
1340 (IndexWriter.getReader()), but may introduce invalid search results
1341 (allowing deleted docs to be returned) for certain cases, so a new
1342 expert ctor was added to CachingWrapperFilter to enforce deletions
1343 at a performance cost. CachingSpanFilter by default recaches if
1344 there are new deletions (Shay Banon via Mike McCandless)
1346 * LUCENE-2299: If you open an NRT reader while addIndexes* is running,
1347 it may miss some segments (Earwin Burrfoot via Mike McCandless)
1349 * LUCENE-2397: Don't throw NPE from SnapshotDeletionPolicy.snapshot if
1350 there are no commits yet (Shai Erera)
1352 * LUCENE-2424: Fix FieldDoc.toString to actually return its fields
1353 (Stephen Green via Mike McCandless)
1355 * LUCENE-2311: Always pass a "fully loaded" (terms index & doc stores)
1356 SegmentsReader to IndexWriter's mergedSegmentWarmer (if set), so
1357 that warming is free to do whatever it needs to. (Earwin Burrfoot
1358 via Mike McCandless)
1360 * LUCENE-3029: Fix corner case when MultiPhraseQuery is used with zero
1361 position-increment tokens that would sometimes assign different
1362 scores to identical docs. (Mike McCandless)
1364 * LUCENE-2486: Fixed intermittent FileNotFoundException on doc store
1365 files when a mergedSegmentWarmer is set on IndexWriter. (Mike
1368 * LUCENE-2130: Fix performance issue when FuzzyQuery runs on a
1369 multi-segment index (Michael McCandless)
1373 * LUCENE-2281: added doBeforeFlush to IndexWriter to allow extensions to perform
1374 operations before flush starts. Also exposed doAfterFlush as protected instead
1375 of package-private. (Shai Erera via Mike McCandless)
1377 * LUCENE-2356: Add IndexWriter.set/getReaderTermsIndexDivisor, to set
1378 what IndexWriter passes for termsIndexDivisor to the readers it
1379 opens internally when applying deletions or creating a
1380 near-real-time reader. (Earwin Burrfoot via Mike McCandless)
1384 * LUCENE-2494 (3.0.2 only): Use CompletionService in ParallelMultiSearcher
1385 instead of simple polling for results. (Edward Drapkin, Simon Willnauer)
1387 * LUCENE-2135: On IndexReader.close, forcefully evict any entries from
1388 the FieldCache rather than waiting for the WeakHashMap to release
1389 the reference (Mike McCandless)
1391 * LUCENE-2161: Improve concurrency of IndexReader, especially in the
1392 context of near real-time readers. (Mike McCandless)
1394 * LUCENE-2360: Small speedup to recycling of reused per-doc RAM in
1395 IndexWriter (Robert Muir, Mike McCandless)
1399 * LUCENE-2488 (2.9.3 only): Support build with JDK 1.4 and exclude Java 1.5
1400 contrib modules on request (pass '-Dforce.jdk14.build=true') when
1401 compiling/testing/packaging. This marks the benchmark contrib also
1402 as Java 1.5, as it depends on fast-vector-highlighter. (Uwe Schindler)
1404 ================== Release 2.9.2 / 3.0.1 ====================
1406 Changes in backwards compatibility policy
1408 * LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm
1409 from FuzzyQuery. The change was needed because the comparator of this
1410 class had to be changed in an incompatible way. The class was never
1411 intended to be public. (Uwe Schindler, Mike McCandless)
1415 * LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode
1416 and equals methods, cause bad things to happen when caching
1417 BooleanQueries. (Chris Hostetter, Mike McCandless)
1419 * LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at
1420 the same time, it's possible for commit to return control back to
1421 one of the threads before all changes are actually committed.
1422 (Sanne Grinovero via Mike McCandless)
1424 * LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser
1425 with a Version argument. (Brian Li via Robert Muir)
1427 * LUCENE-2166: Don't incorrectly keep warning about the same immense
1428 term, when IndexWriter.infoStream is on. (Mike McCandless)
1430 * LUCENE-2158: At high indexing rates, NRT reader could temporarily
1431 lose deletions. (Mike McCandless)
1433 * LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load
1434 implementation class when interface was loaded by a different
1435 class loader. (Uwe Schindler, reported on java-user by Ahmed El-dawy)
1437 * LUCENE-2257: Increase max number of unique terms in one segment to
1438 termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
1439 (Tom Burton-West via Mike McCandless)
1441 * LUCENE-2260: Fixed AttributeSource to not hold a strong
1442 reference to the Attribute/AttributeImpl classes which prevents
1443 unloading of custom attributes loaded by other classloaders
1444 (e.g. in Solr plugins). (Uwe Schindler)
1446 * LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when
1447 only one payload is present. (Erik Hatcher, Mike McCandless
1450 * LUCENE-2270: Queries consisting of all zero-boost clauses
1451 (for example, text:foo^0) sorted incorrectly and produced
1452 invalid docids. (yonik)
1456 * LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor
1457 (it was accidentally removed in 3.0.0) (Mike McCandless)
1459 * LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource
1460 (it was accidentally removed in 3.0.0) (John Wang via Uwe Schindler)
1462 * LUCENE-2190: Added a new class CustomScoreProvider to function package
1463 that can be subclassed to provide custom scoring to CustomScoreQuery.
1464 The methods in CustomScoreQuery that did this before were deprecated
1465 and replaced by a method getCustomScoreProvider(IndexReader) that
1466 returns a custom score implementation using the above class. The change
1467 is necessary with per-segment searching, as CustomScoreQuery is
1468 a stateless class (like all other Queries) and does not know about
1469 the currently searched segment. This API works similar to Filter's
1470 getDocIdSet(IndexReader). (Paul chez Jamespot via Mike McCandless,
1473 * LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant
1474 will cause backwards compatibility problems when upgrading Lucene. See
1475 the Version javadocs for additional information.
1480 * LUCENE-2086: When resolving deleted terms, do so in term sort order
1481 for better performance (Bogdan Ghidireac via Mike McCandless)
1483 * LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue
1484 added by LUCENE-504. (Uwe Schindler, Robert Muir, Mike McCandless)
1486 * LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum.
1487 (Uwe Schindler, Robert Muir)
1491 * LUCENE-2114: Change TestFilteredSearch to test on multi-segment
1492 index as well. (Simon Willnauer via Mike McCandless)
1494 * LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute
1495 that checks if clearAttributes() was called correctly.
1496 (Uwe Schindler, Robert Muir)
1498 * LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if
1499 end() is implemented correctly. (Koji Sekiguchi, Robert Muir)
1503 * LUCENE-2114: Improve javadocs of Filter to call out that the
1504 provided reader is per-segment (Simon Willnauer via Mike
1507 ======================= Release 3.0.0 =======================
1509 Changes in backwards compatibility policy
1511 * LUCENE-1979: Change return type of SnapshotDeletionPolicy#snapshot()
1512 from IndexCommitPoint to IndexCommit. Code that uses this method
1513 needs to be recompiled against Lucene 3.0 in order to work. The
1514 previously deprecated IndexCommitPoint is also removed.
1517 * o.a.l.Lock.isLocked() is now allowed to throw an IOException.
1520 * LUCENE-2030: CachingWrapperFilter and CachingSpanFilter now hide
1521 the internal cache implementation for thread safety, before it was
1522 declared protected. (Peter Lenahan, Uwe Schindler, Simon Willnauer)
1524 * LUCENE-2053: If you call Thread.interrupt() on a thread inside
1525 Lucene, Lucene will do its best to interrupt the thread. However,
1526 instead of throwing InterruptedException (which is a checked
1527 exception), you'll get an oal.util.ThreadInterruptedException (an
1528 unchecked exception, subclassing RuntimeException). The interrupt
1529 status on the thread is cleared when this exception is thrown.
1532 * LUCENE-2052: Some methods in Lucene core were changed to accept
1533 Java 5 varargs. This is not a backwards compatibility problem as
1534 long as you not try to override such a method. We left common
1535 overridden methods unchanged and added varargs to constructors,
1536 static, or final methods (MultiSearcher,...). (Uwe Schindler)
1538 * LUCENE-1558: IndexReader.open(Directory) now opens a readOnly=true
1539 reader, and new IndexSearcher(Directory) does the same. Note that
1540 this is a change in the default from 2.9, when these methods were
1541 previously deprecated. (Mike McCandless)
1543 * LUCENE-1753: Make not yet final TokenStreams final to enforce
1544 decorator pattern. (Uwe Schindler)
1546 Changes in runtime behavior
1548 * LUCENE-1677: Remove the system property to set SegmentReader class
1549 implementation. (Uwe Schindler)
1551 * LUCENE-1960: As a consequence of the removal of Field.Store.COMPRESS,
1552 support for this type of fields was removed. Lucene 3.0 is still able
1553 to read indexes with compressed fields, but as soon as merges occur
1554 or the index is optimized, all compressed fields are decompressed
1555 and converted to Field.Store.YES. Because of this, indexes with
1556 compressed fields can suddenly get larger. Also the first merge with
1557 decompression cannot be done in raw mode, it is therefore slower.
1558 This change has no effect for code that uses such old indexes,
1559 they behave as before (fields are automatically decompressed
1560 during read). Indexes converted to Lucene 3.0 format cannot be read
1561 anymore with previous versions.
1562 It is recommended to optimize your indexes after upgrading to convert
1563 to the new format and decompress all fields.
1564 If you want compressed fields, you can use CompressionTools, that
1565 creates compressed byte[] to be added as binary stored field. This
1566 cannot be done automatically, as you also have to decompress such
1567 fields when reading. You have to reindex to do that.
1568 (Michael Busch, Uwe Schindler)
1570 * LUCENE-2060: Changed ConcurrentMergeScheduler's default for
1571 maxNumThreads from 3 to 1, because in practice we get the most
1572 gains from running a single merge in the background. More than one
1573 concurrent merge causes a lot of thrashing (though it's possible on
1574 SSD storage that there would be net gains). (Jason Rutherglen,
1579 * LUCENE-1257, LUCENE-1984, LUCENE-1985, LUCENE-2057, LUCENE-1833, LUCENE-2012,
1580 LUCENE-1998: Port to Java 1.5:
1582 - Add generics to public and internal APIs (see below).
1583 - Replace new Integer(int), new Double(double),... by static valueOf() calls.
1584 - Replace for-loops with Iterator by foreach loops.
1585 - Replace StringBuffer with StringBuilder.
1586 - Replace o.a.l.util.Parameter by Java 5 enums (see below).
1587 - Add @Override annotations.
1588 (Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera,
1591 * Generify Lucene API:
1593 - TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an
1594 instance of the requested attribute interface and no cast needed anymore
1596 - NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter
1597 now have Integer, Long, Float, Double as type param (LUCENE-1857).
1598 - Document.getFields() returns List<Fieldable>.
1599 - Query.extractTerms(Set<Term>)
1600 - CharArraySet and stop word sets in core/contrib
1601 - PriorityQueue (LUCENE-1935)
1603 - DisjunctionMaxQuery (LUCENE-1984)
1604 - MultiTermQueryWrapperFilter
1605 - CloseableThreadLocal
1607 - o.a.l.util.cache package
1608 - lot's of internal APIs of IndexWriter
1609 (Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani)
1611 * LUCENE-1944, LUCENE-1856, LUCENE-1957, LUCENE-1960, LUCENE-1961,
1612 LUCENE-1968, LUCENE-1970, LUCENE-1946, LUCENE-1971, LUCENE-1975,
1613 LUCENE-1972, LUCENE-1978, LUCENE-944, LUCENE-1979, LUCENE-1973, LUCENE-2011:
1614 Remove deprecated methods/constructors/classes:
1616 - Remove all String/File directory paths in IndexReader /
1617 IndexSearcher / IndexWriter.
1618 - Remove FSDirectory.getDirectory()
1619 - Make FSDirectory abstract.
1620 - Remove Field.Store.COMPRESS (see above).
1621 - Remove Filter.bits(IndexReader) method and make
1622 Filter.getDocIdSet(IndexReader) abstract.
1623 - Remove old DocIdSetIterator methods and make the new ones abstract.
1624 - Remove some methods in PriorityQueue.
1625 - Remove old TokenStream API and backwards compatibility layer.
1626 - Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery.
1627 - Remove SpanQuery.getTerms().
1628 - Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO.
1629 - Remove old-style custom sort.
1630 - Remove legacy search setting in SortField.
1631 - Remove Hits and all references from core and contrib.
1632 - Remove HitCollector and its TopDocs support implementations.
1633 - Remove term field and accessors in MultiTermQuery
1634 (and fix Highlighter).
1635 - Remove deprecated methods in BooleanQuery.
1636 - Remove deprecated methods in Similarity.
1637 - Remove BoostingTermQuery.
1638 - Remove MultiValueSource.
1639 - Remove Scorer.explain(int).
1640 ...and some other minor ones (Uwe Schindler, Michael Busch, Mark Miller)
1642 * LUCENE-1925: Make IndexSearcher's subReaders and docStarts members
1643 protected; add expert ctor to directly specify reader, subReaders
1644 and docStarts. (John Wang, Tim Smith via Mike McCandless)
1646 * LUCENE-1945: All public classes that have a close() method now
1647 also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...).
1650 * LUCENE-1998: Change all Parameter instances to Java 5 enums. This
1651 is no backwards-break, only a change of the super class. Parameter
1652 was deprecated and will be removed in a later version.
1653 (DM Smith, Uwe Schindler)
1657 * LUCENE-1951: When the text provided to WildcardQuery has no wildcard
1658 characters (ie matches a single term), don't lose the boost and
1659 rewrite method settings. Also, rewrite to PrefixQuery if the
1660 wildcard is form "foo*", for slightly faster performance. (Robert
1661 Muir via Mike McCandless)
1663 * LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
1664 (Benjamin Keil via Mark Miller)
1666 * LUCENE-2088: addAttribute() should only accept interfaces that
1667 extend Attribute. (Shai Erera, Uwe Schindler)
1669 * LUCENE-2045: Fix silly FileNotFoundException hit if you enable
1670 infoStream on IndexWriter and then add an empty document and commit
1671 (Shai Erera via Mike McCandless)
1673 * LUCENE-2046: IndexReader should not see the index as changed, after
1674 IndexWriter.prepareCommit has been called but before
1675 IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
1679 * LUCENE-1933: Provide a convenience AttributeFactory that creates a
1680 Token instance for all basic attributes. (Uwe Schindler)
1682 * LUCENE-2041: Parallelize the rest of ParallelMultiSearcher. Lots of
1683 code refactoring and Java 5 concurrent support in MultiSearcher.
1684 (Joey Surls, Simon Willnauer via Uwe Schindler)
1686 * LUCENE-2051: Add CharArraySet.copy() as a simple method to copy
1687 any Set<?> to a CharArraySet that is optimized, if Set<?> is already
1688 an CharArraySet. (Simon Willnauer)
1692 * LUCENE-1183: Optimize Levenshtein Distance computation in
1693 FuzzyQuery. (CĂ©drik Lime via Mike McCandless)
1695 * LUCENE-2006: Optimization of FieldDocSortedHitQueue to always
1696 use Comparable<?> interface. (Uwe Schindler, Mark Miller)
1698 * LUCENE-2087: Remove recursion in NumericRangeTermEnum.
1703 * LUCENE-486: Remove test->demo dependencies. (Michael Busch)
1705 * LUCENE-2024: Raise build requirements to Java 1.5 and ANT 1.7.0
1706 (Uwe Schindler, Mike McCandless)
1708 ======================= Release 2.9.1 =======================
1710 Changes in backwards compatibility policy
1712 * LUCENE-2002: Add required Version matchVersion argument when
1713 constructing QueryParser or MultiFieldQueryParser and, default (as
1714 of 2.9) enablePositionIncrements to true to match
1715 StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless)
1719 * LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used
1720 BooleanScorer for scoring), whereby some matching documents fail to
1721 be collected. (Fulin Tang via Mike McCandless)
1723 * LUCENE-1124: Make sure FuzzyQuery always matches the precise term.
1724 (stefatwork@gmail.com via Mike McCandless)
1726 * LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing
1727 when the reader is a near real-time reader. (Jake Mannix via Mike
1730 * LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan,
1731 Mark Miller via Mike McCandless)
1733 * LUCENE-1992: Fix thread hazard if a merge is committing just as an
1734 exception occurs during sync (Uwe Schindler, Mike McCandless)
1736 * LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB
1737 cannot exceed 2048 MB, and throw IllegalArgumentException if it
1738 does. (Aaron McKee, Yonik Seeley, Mike McCandless)
1740 * LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined
1741 by client code. (Uwe Schindler)
1743 * LUCENE-2016: Replace illegal U+FFFF character with the replacement
1744 char (U+FFFD) during indexing, to prevent silent index corruption.
1745 (Peter Keegan, Mike McCandless)
1749 * Un-deprecate search(Weight weight, Filter filter, int n) from
1750 Searchable interface (deprecated by accident). (Uwe Schindler)
1752 * Un-deprecate o.a.l.util.Version constants. (Mike McCandless)
1754 * LUCENE-1987: Un-deprecate some ctors of Token, as they will not
1755 be removed in 3.0 and are still useful. Also add some missing
1756 o.a.l.util.Version constants for enabling invalid acronym
1757 settings in StandardAnalyzer to be compatible with the coming
1758 Lucene 3.0. (Uwe Schindler)
1760 * LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring,
1761 to allow controlling per-IndexSearcher whether scores are computed
1762 when sorting by field. (Uwe Schindler, Mike McCandless)
1764 * LUCENE-2043: Make IndexReader.commit(Map<String,String>) public.
1769 * LUCENE-1955: Fix Hits deprecation notice to point users in right
1770 direction. (Mike McCandless, Mark Miller)
1772 * Fix javadoc about score tracking done by search methods in Searcher
1773 and IndexSearcher. (Mike McCandless)
1775 * LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
1776 (Luke Nezda via Mike McCandless)
1778 ======================= Release 2.9.0 =======================
1780 Changes in backwards compatibility policy
1782 * LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
1783 longer computes a document score for each hit by default. If
1784 document score tracking is still needed, you can call
1785 IndexSearcher.setDefaultFieldSortScoring(true, true) to enable
1786 both per-hit and maxScore tracking; however, this is deprecated
1787 and will be removed in 3.0.
1789 Alternatively, use Searchable.search(Weight, Filter, Collector)
1790 and pass in a TopFieldCollector instance, using the following code
1794 TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields,
1795 true /* trackDocScores */,
1796 true /* trackMaxScore */,
1797 false /* docsInOrder */);
1798 searcher.search(query, tfc);
1799 TopDocs results = tfc.topDocs();
1802 Note that your Sort object cannot use SortField.AUTO when you
1803 directly instantiate TopFieldCollector.
1805 Also, the method search(Weight, Filter, Collector) was added to
1806 the Searchable interface and the Searcher abstract class to
1807 replace the deprecated HitCollector versions. If you either
1808 implement Searchable or extend Searcher, you should change your
1809 code to implement this method. If you already extend
1810 IndexSearcher, no further changes are needed to use Collector.
1812 Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not
1813 valid scores. Lucene uses these values internally in certain
1814 places, so if you have hits with such scores, it will cause
1815 problems. (Shai Erera via Mike McCandless)
1817 * LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache
1818 have been moved into FieldCache. ExtendedFieldCache is now deprecated and
1819 contains only a few declarations for binary backwards compatibility.
1820 ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and
1821 ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation.
1822 The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of
1823 ExtendedFieldCache and FieldCache, FieldCache can now additionally return
1824 long[] and double[] arrays in addition to int[] and float[] and StringIndex.
1826 The interface changes are only notable for users implementing the interfaces,
1827 which was unlikely done, because there is no possibility to change
1828 Lucene's FieldCache implementation. (Grant Ingersoll, Uwe Schindler)
1830 * LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
1831 class. Some of the method signatures have changed, but it should be fairly
1832 easy to see what adjustments must be made to existing code to sync up
1833 with the new API. You can find more detail in the API Changes section.
1835 Going forward Searchable will be kept for convenience only and may
1836 be changed between minor releases without any deprecation
1837 process. It is not recommended that you implement it, but rather extend
1839 (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
1841 * LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below)
1842 has some backwards breaks in rare cases. We did our best to make the
1843 transition as easy as possible and you are not likely to run into any problems.
1844 If your tokenizers still implement next(Token) or next(), the calls are
1845 automatically wrapped. The indexer and query parser use the new API
1846 (eg use incrementToken() calls). All core TokenStreams are implemented using
1847 the new API. You can mix old and new API style TokenFilters/TokenStream.
1848 Problems only occur when you have done the following:
1849 You have overridden next(Token) or next() in one of the non-abstract core
1850 TokenStreams/-Filters. These classes should normally be final, but some
1851 of them are not. In this case, next(Token)/next() would never be called.
1852 To fail early with a hard compile/runtime error, the next(Token)/next()
1853 methods in these TokenStreams/-Filters were made final in this release.
1854 (Michael Busch, Uwe Schindler)
1856 * LUCENE-1763: MergePolicy now requires an IndexWriter instance to
1857 be passed upon instantiation. As a result, IndexWriter was removed
1858 as a method argument from all MergePolicy methods. (Shai Erera via
1861 * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
1862 compat break and caused custom SpanQuery implementations to fail at runtime
1863 in a variety of ways. This issue attempts to remedy things by causing
1864 a compile time break on custom SpanQuery implementations and removing
1865 the PayloadSpans class, with its functionality now moved to Spans. To
1866 help in alleviating future back compat pain, Spans has been changed from
1867 an interface to an abstract class.
1868 (Hugh Cayless, Mark Miller)
1870 * LUCENE-1808: Query.createWeight has been changed from protected to
1871 public. This will be a back compat break if you have overridden this
1872 method - but you are likely already affected by the LUCENE-1693 (make Weight
1873 abstract rather than an interface) back compat break if you have overridden
1874 Query.creatWeight, so we have taken the opportunity to make this change.
1875 (Tim Smith, Shai Erera via Mark Miller)
1877 * LUCENE-1708 - IndexReader.document() no longer checks if the document is
1878 deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
1879 (Shai Erera via Mike McCandless)
1882 Changes in runtime behavior
1884 * LUCENE-1424: QueryParser now by default uses constant score auto
1885 rewriting when it generates a WildcardQuery and PrefixQuery (it
1886 already does so for TermRangeQuery, as well). Call
1887 setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
1888 to revert to slower BooleanQuery rewriting method. (Mark Miller via Mike
1891 * LUCENE-1575: As of 2.9, the core collectors as well as
1892 IndexSearcher's search methods that return top N results, no
1893 longer filter documents with scores <= 0.0. If you rely on this
1894 functionality you can use PositiveScoresOnlyCollector like this:
1897 TopDocsCollector tdc = new TopScoreDocCollector(10);
1898 Collector c = new PositiveScoresOnlyCollector(tdc);
1899 searcher.search(query, c);
1900 TopDocs hits = tdc.topDocs();
1904 * LUCENE-1604: IndexReader.norms(String field) is now allowed to
1905 return null if the field has no norms, as long as you've
1906 previously called IndexReader.setDisableFakeNorms(true). This
1907 setting now defaults to false (to preserve the fake norms back
1908 compatible behavior) but in 3.0 will be hardwired to true. (Shon
1909 Vella via Mike McCandless).
1911 * LUCENE-1624: If you open IndexWriter with create=true and
1912 autoCommit=false on an existing index, IndexWriter no longer
1913 writes an empty commit when it's created. (Paul Taylor via Mike
1916 * LUCENE-1593: When you call Sort() or Sort.setSort(String field,
1917 boolean reverse), the resulting SortField array no longer ends
1918 with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties
1919 internally by docID). (Shai Erera via Michael McCandless)
1921 * LUCENE-1542: When the first token(s) have 0 position increment,
1922 IndexWriter used to incorrectly record the position as -1, if no
1923 payload is present, or Integer.MAX_VALUE if a payload is present.
1924 This causes positional queries to fail to match. The bug is now
1925 fixed, but if your app relies on the buggy behavior then you must
1926 call IndexWriter.setAllowMinus1Position(). That API is deprecated
1927 so you must fix your application, and rebuild your index, to not
1928 rely on this behavior by the 3.0 release of Lucene. (Jonathan
1929 Mamou, Mark Miller via Mike McCandless)
1932 * LUCENE-1715: Finalizers have been removed from the 4 core classes
1933 that still had them, since they will cause GC to take longer, thus
1934 tying up memory for longer, and at best they mask buggy app code.
1935 DirectoryReader (returned from IndexReader.open) & IndexWriter
1936 previously released the write lock during finalize.
1937 SimpleFSDirectory.FSIndexInput closed the descriptor in its
1938 finalizer, and NativeFSLock released the lock. It's possible
1939 applications will be affected by this, but only if the application
1940 is failing to close reader/writers. (Brian Groose via Mike
1943 * LUCENE-1717: Fixed IndexWriter to account for RAM usage of
1944 buffered deletions. (Mike McCandless)
1946 * LUCENE-1727: Ensure that fields are stored & retrieved in the
1947 exact order in which they were added to the document. This was
1948 true in all Lucene releases before 2.3, but was broken in 2.3 and
1949 2.4, and is now fixed in 2.9. (Mike McCandless)
1951 * LUCENE-1678: The addition of Analyzer.reusableTokenStream
1952 accidentally broke back compatibility of external analyzers that
1953 subclassed core analyzers that implemented tokenStream but not
1954 reusableTokenStream. This is now fixed, such that if
1955 reusableTokenStream is invoked on such a subclass, that method
1956 will forcefully fallback to tokenStream. (Mike McCandless)
1958 * LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear
1959 startOffset, endOffset and type. This is not likely to affect any
1960 Tokenizer chains, as Tokenizers normally always set these three values.
1961 This change was made to be conform to the new AttributeImpl.clear() and
1962 AttributeSource.clearAttributes() to work identical for Token as one for all
1963 AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch)
1965 * LUCENE-1483: When searching over multiple segments, a new Scorer is now created
1966 for each segment. Searching has been telescoped out a level and IndexSearcher now
1967 operates much like MultiSearcher does. The Weight is created only once for the top
1968 level Searcher, but each Scorer is passed a per-segment IndexReader. This will
1969 result in doc ids in the Scorer being internal to the per-segment IndexReader. It
1970 has always been outside of the API to count on a given IndexReader to contain every
1971 doc id in the index - and if you have been ignoring MultiSearcher in your custom code
1972 and counting on this fact, you will find your code no longer works correctly. If a
1973 custom Scorer implementation uses any caches/filters that rely on being based on the
1974 top level IndexReader, it will need to be updated to correctly use contextless
1975 caches/filters eg you can't count on the IndexReader to contain any given doc id or
1976 all of the doc ids. (Mark Miller, Mike McCandless)
1978 * LUCENE-1846: DateTools now uses the US locale to format the numbers in its
1979 date/time strings instead of the default locale. For most locales there will
1980 be no change in the index format, as DateFormatSymbols is using ASCII digits.
1981 The usage of the US locale is important to guarantee correct ordering of
1982 generated terms. (Uwe Schindler)
1984 * LUCENE-1860: MultiTermQuery now defaults to
1985 CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it
1986 was SCORING_BOOLEAN_QUERY_REWRITE). This means that PrefixQuery
1987 and WildcardQuery will now produce constant score for all matching
1988 docs, equal to the boost of the query. (Mike McCandless)
1992 * LUCENE-1419: Add expert API to set custom indexing chain. This API is
1993 package-protected for now, so we don't have to officially support it.
1994 Yet, it will give us the possibility to try out different consumers
1995 in the chain. (Michael Busch)
1997 * LUCENE-1427: DocIdSet.iterator() is now allowed to throw
1998 IOException. (Paul Elschot, Mike McCandless)
2000 * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called
2001 AttributeSource instead of the Token class, which is now a utility class that
2002 holds common Token attributes. All attributes that the Token class had have
2003 been moved into separate classes: TermAttribute, OffsetAttribute,
2004 PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute.
2005 The new API is much more flexible; it allows to combine the Attributes
2006 arbitrarily and also to define custom Attributes. The new API has the same
2007 performance as the old next(Token) approach. For conformance with this new
2008 API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
2009 (Michael Busch, Uwe Schindler; additional contributions and bug fixes by
2010 Daniel Shane, Doron Cohen)
2012 * LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator.
2013 These methods can be used to avoid additional calls to doc().
2016 * LUCENE-1468: Deprecate Directory.list(), which sometimes (in
2017 FSDirectory) filters out files that don't look like index files, in
2018 favor of new Directory.listAll(), which does no filtering. Also,
2019 listAll() will never return null; instead, it throws an IOException
2020 (or subclass). Specifically, FSDirectory.listAll() will throw the
2021 newly added NoSuchDirectoryException if the directory does not
2022 exist. (Marcel Reutegger, Mike McCandless)
2024 * LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
2025 you to record an opaque commitUserData (maps String -> String) into
2026 the commit written by IndexReader. This matches IndexWriter's
2027 commit methods. (Jason Rutherglen via Mike McCandless)
2029 * LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
2030 enable compressing & decompressing binary content, external to
2031 Lucene's indexing. Deprecated Field.Store.COMPRESS.
2033 * LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions
2034 (Otis Gospodnetic via Mike McCandless)
2036 * LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods
2037 to denote issues when offsets in TokenStream tokens exceed the length of the
2038 provided text. (Mark Harwood)
2040 * LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of
2041 a new Collector abstract class. For easy migration, people can use
2042 HitCollectorWrapper which translates (wraps) HitCollector into
2043 Collector. Note that this class is also deprecated and will be
2044 removed when HitCollector is removed. Also TimeLimitedCollector
2045 is deprecated in favor of the new TimeLimitingCollector which
2046 extends Collector. (Shai Erera, Mark Miller, Mike McCandless)
2048 * LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because
2049 it is used nowhere in core/contrib and there is only a very ineffective
2050 default implementation available. If you want to position a TermEnum
2051 to another Term, create a new one using IndexReader.terms(Term).
2054 * LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does
2055 not make sense for all subclasses of MultiTermQuery. Check individual
2056 subclasses to see if they support getTerm(). (Mark Miller)
2058 * LUCENE-1636: Make TokenFilter.input final so it's set only
2059 once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
2061 * LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
2062 (but left an FSDirectory base class). Added an FSDirectory.open
2063 static method to pick a good default FSDirectory implementation
2064 given the OS. FSDirectories should now be instantiated using
2065 FSDirectory.open or with public constructors rather than
2066 FSDirectory.getDirectory(), which has been deprecated.
2067 (Michael McCandless, Uwe Schindler, yonik)
2069 * LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0.
2070 Instead, when sorting by field, the application should explicitly
2071 state the type of the field. (Mike McCandless)
2073 * LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now
2074 require up front specification of enablePositionIncrement (Mike
2077 * LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor
2078 of the new nextDoc() and advance(). The new methods return the doc Id they
2079 landed on, saving an extra call to doc() in most cases.
2080 For easy migration of the code, you can change the calls to next() to
2081 nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo().
2082 However it is advised that you take advantage of the returned doc ID and not
2083 call doc() following those two.
2084 Also, doc() was deprecated in favor of docID(). docID() should return -1 or
2085 NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the
2086 iterator has exhausted. Otherwise it should return the current doc ID.
2087 (Shai Erera via Mike McCandless)
2089 * LUCENE-1672: All ctors/opens and other methods using String/File to
2090 specify the directory in IndexReader, IndexWriter, and IndexSearcher
2091 were deprecated. You should instantiate the Directory manually before
2092 and pass it to these classes (LUCENE-1451, LUCENE-1658).
2095 * LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out
2096 of Lucene's core into new contrib/remote package. Searchable no
2097 longer extends java.rmi.Remote (Simon Willnauer via Mike
2100 * LUCENE-1677: The global property
2101 org.apache.lucene.SegmentReader.class, and
2102 ReadOnlySegmentReader.class are now deprecated, to be removed in
2103 3.0. src/gcj/* has been removed. (Earwin Burrfoot via Mike
2106 * LUCENE-1673: Deprecated NumberTools in favour of the new
2107 NumericRangeQuery and its new indexing format for numeric or
2108 date values. (Uwe Schindler)
2110 * LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds
2111 a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /*
2112 topScorer */) method instead of scorer(IndexReader). IndexSearcher uses
2113 this method to obtain a scorer matching the capabilities of the Collector
2114 wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more
2115 efficient if out-of-order documents scoring is allowed by a Collector.
2116 Collector must now implement acceptsDocsOutOfOrder. If you write a
2117 Collector which does not care about doc ID orderness, it is recommended
2118 that you return true. Weight has a scoresDocsOutOfOrder method, which by
2119 default returns false. If you create a Weight which will score documents
2120 out of order if requested, you should override that method to return true.
2121 BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been
2122 deprecated as they are not needed anymore. BooleanQuery will now score docs
2123 out of order when used with a Collector that can accept docs out of order.
2124 Finally, Weight#explain now takes a sub-reader and sub-docID, rather than
2125 a top level reader and docID.
2126 (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
2128 * LUCENE-1466, LUCENE-1906: Added CharFilter and MappingCharFilter, which allows
2129 chaining & mapping of characters before tokenizers run. CharStream (subclass of
2130 Reader) is the base class for custom java.io.Reader's, that support offset
2131 correction. Tokenizers got an additional method correctOffset() that is passed
2132 down to the underlying CharStream if input is a subclass of CharStream/-Filter.
2133 (Koji Sekiguchi via Mike McCandless, Uwe Schindler)
2135 * LUCENE-1703: Add IndexWriter.waitForMerges. (Tim Smith via Mike
2138 * LUCENE-1625: CheckIndex's programmatic API now returns separate
2139 classes detailing the status of each component in the index, and
2140 includes more detailed status than previously. (Tim Smith via
2143 * LUCENE-1713: Deprecated RangeQuery and RangeFilter and renamed to
2144 TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant
2145 score auto rewrite mode by default. The new classes also have new
2146 ctors taking field and term ranges as Strings (see also
2147 LUCENE-1424). (Uwe Schindler)
2149 * LUCENE-1609: The termInfosIndexDivisor must now be specified
2150 up-front when opening the IndexReader. Attempts to call
2151 IndexReader.setTermInfosIndexDivisor will hit an
2152 UnsupportedOperationException. This was done to enable removal of
2153 all synchronization in TermInfosReader, which previously could
2154 cause threads to pile up in certain cases. (Dan Rosher via Mike
2157 * LUCENE-1688: Deprecate static final String stop word array in and
2158 StopAnalzyer and replace it with an immutable implementation of
2159 CharArraySet. (Simon Willnauer via Mark Miller)
2161 * LUCENE-1742: SegmentInfos, SegmentInfo and SegmentReader have been
2162 made public as expert, experimental APIs. These APIs may suddenly
2163 change from release to release (Jason Rutherglen via Mike
2166 * LUCENE-1754: QueryWeight.scorer() can return null if no documents
2167 are going to be matched by the query. Similarly,
2168 Filter.getDocIdSet() can return null if no documents are going to
2169 be accepted by the Filter. Note that these 'can' return null,
2170 however they don't have to and can return a Scorer/DocIdSet which
2171 does not match / reject all documents. This is already the
2172 behavior of some QueryWeight/Filter implementations, and is
2173 documented here just for emphasis. (Shai Erera via Mike
2176 * LUCENE-1705: Added IndexWriter.deleteAllDocuments. (Tim Smith via
2179 * LUCENE-1460: Changed TokenStreams/TokenFilters in contrib to
2180 use the new TokenStream API. (Robert Muir, Michael Busch)
2182 * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
2183 compat break and caused custom SpanQuery implementations to fail at runtime
2184 in a variety of ways. This issue attempts to remedy things by causing
2185 a compile time break on custom SpanQuery implementations and removing
2186 the PayloadSpans class, with its functionality now moved to Spans. To
2187 help in alleviating future back compat pain, Spans has been changed from
2188 an interface to an abstract class.
2189 (Hugh Cayless, Mark Miller)
2191 * LUCENE-1808: Query.createWeight has been changed from protected to
2192 public. (Tim Smith, Shai Erera via Mark Miller)
2194 * LUCENE-1826: Add constructors that take AttributeSource and
2195 AttributeFactory to all Tokenizer implementations.
2198 * LUCENE-1847: Similarity#idf for both a Term and Term Collection have
2199 been deprecated. New versions that return an IDFExplanation have been
2200 added. (Yasoja Seneviratne, Mike McCandless, Mark Miller)
2202 * LUCENE-1877: Made NativeFSLockFactory the default for
2203 the new FSDirectory API (open(), FSDirectory subclass ctors).
2204 All FSDirectory system properties were deprecated and all lock
2205 implementations use no lock prefix if the locks are stored inside
2206 the index directory. Because the deprecated String/File ctors of
2207 IndexWriter and IndexReader (LUCENE-1672) and FSDirectory.getDirectory()
2208 still use the old SimpleFSLockFactory and the new API
2209 NativeFSLockFactory, we strongly recommend not to mix deprecated
2210 and new API. (Uwe Schindler, Mike McCandless)
2212 * LUCENE-1911: Added a new method isCacheable() to DocIdSet. This method
2213 should return true, if the underlying implementation does not use disk
2214 I/O and is fast enough to be directly cached by CachingWrapperFilter.
2215 OpenBitSet, SortedVIntList, and DocIdBitSet are such candidates.
2216 The default implementation of the abstract DocIdSet class returns false.
2217 In this case, CachingWrapperFilter copies the DocIdSetIterator into an
2218 OpenBitSet for caching. (Uwe Schindler, Thomas Becker)
2222 * LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
2223 implementation - Leads to Solr Cache misses.
2224 (Todd Feak, Mark Miller via yonik)
2226 * LUCENE-1327: Fix TermSpans#skipTo() to behave as specified in javadocs
2227 of Terms#skipTo(). (Michael Busch)
2229 * LUCENE-1573: Do not ignore InterruptedException (caused by
2230 Thread.interrupt()) nor enter deadlock/spin loop. Now, an interrupt
2231 will cause a RuntimeException to be thrown. In 3.0 we will change
2232 public APIs to throw InterruptedException. (Jeremy Volkman via
2235 * LUCENE-1590: Fixed stored-only Field instances do not change the
2236 value of omitNorms, omitTermFreqAndPositions in FieldInfo; when you
2237 retrieve such fields they will now have omitNorms=true and
2238 omitTermFreqAndPositions=false (though these values are unused).
2239 (Uwe Schindler via Mike McCandless)
2241 * LUCENE-1587: RangeQuery#equals() could consider a RangeQuery
2242 without a collator equal to one with a collator.
2243 (Mark Platvoet via Mark Miller)
2245 * LUCENE-1600: Don't call String.intern unnecessarily in some cases
2246 when loading documents from the index. (P Eger via Mike
2249 * LUCENE-1611: Fix case where OutOfMemoryException in IndexWriter
2250 could cause "infinite merging" to happen. (Christiaan Fluit via
2253 * LUCENE-1623: Properly handle back-compatibility of 2.3.x indexes that
2254 contain field names with non-ascii characters. (Mike Streeton via
2257 * LUCENE-1593: MultiSearcher and ParallelMultiSearcher did not break ties (in
2258 sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs.
2259 when it wasn't). (Shai Erera via Michael McCandless)
2261 * LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
2262 the segment's deletion count to be incorrect. (Mike McCandless)
2264 * LUCENE-1542: When the first token(s) have 0 position increment,
2265 IndexWriter used to incorrectly record the position as -1, if no
2266 payload is present, or Integer.MAX_VALUE if a payload is present.
2267 This causes positional queries to fail to match. The bug is now
2268 fixed, but if your app relies on the buggy behavior then you must
2269 call IndexWriter.setAllowMinus1Position(). That API is deprecated
2270 so you must fix your application, and rebuild your index, to not
2271 rely on this behavior by the 3.0 release of Lucene. (Jonathan
2272 Mamou, Mark Miller via Mike McCandless)
2274 * LUCENE-1658: Fixed MMapDirectory to correctly throw IOExceptions
2275 on EOF, removed numeric overflow possibilities and added support
2276 for a hack to unmap the buffers on closing IndexInput.
2279 * LUCENE-1681: Fix infinite loop caused by a call to DocValues methods
2280 getMinValue, getMaxValue, getAverageValue. (Simon Willnauer via Mark Miller)
2282 * LUCENE-1599: Add clone support for SpanQuerys. SpanRegexQuery counts
2283 on this functionality and does not work correctly without it.
2284 (Billow Gao, Mark Miller)
2286 * LUCENE-1718: Fix termInfosIndexDivisor to carry over to reopened
2287 readers (Mike McCandless)
2289 * LUCENE-1583: SpanOrQuery skipTo() doesn't always move forwards as Spans
2290 documentation indicates it should. (Moti Nisenson via Mark Miller)
2292 * LUCENE-1566: Sun JVM Bug
2293 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 causes
2294 invalid OutOfMemoryError when reading too many bytes at once from
2295 a file on 32bit JVMs that have a large maximum heap size. This
2296 fix adds set/getReadChunkSize to FSDirectory so that large reads
2297 are broken into chunks, to work around this JVM bug. On 32bit
2298 JVMs the default chunk size is 100 MB; on 64bit JVMs, which don't
2299 show the bug, the default is Integer.MAX_VALUE. (Simon Willnauer
2300 via Mike McCandless)
2302 * LUCENE-1448: Added TokenStream.end() to perform end-of-stream
2303 operations (ie to return the end offset of the tokenization).
2304 This is important when multiple fields with the same name are added
2305 to a document, to ensure offsets recorded in term vectors for all
2306 of the instances are correct.
2307 (Mike McCandless, Mark Miller, Michael Busch)
2309 * LUCENE-1805: CloseableThreadLocal did not allow a null Object in get(),
2310 although it does allow it in set(Object). Fix get() to not assert the object
2311 is not null. (Shai Erera via Mike McCandless)
2313 * LUCENE-1801: Changed all Tokenizers or TokenStreams in core/contrib)
2314 that are the source of Tokens to always call
2315 AttributeSource.clearAttributes() first. (Uwe Schindler)
2317 * LUCENE-1819: MatchAllDocsQuery.toString(field) should produce output
2318 that is parsable by the QueryParser. (John Wang, Mark Miller)
2320 * LUCENE-1836: Fix localization bug in the new query parser and add
2321 new LocalizedTestCase as base class for localization junit tests.
2322 (Robert Muir, Uwe Schindler via Michael Busch)
2324 * LUCENE-1847: PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats
2325 in their Weight#explain methods - these stats should be corpus wide.
2326 (Yasoja Seneviratne, Mike McCandless, Mark Miller)
2328 * LUCENE-1885: Fix the bug that NativeFSLock.isLocked() did not work,
2329 if the lock was obtained by another NativeFSLock(Factory) instance.
2330 Because of this IndexReader.isLocked() and IndexWriter.isLocked() did
2331 not work correctly. (Uwe Schindler)
2333 * LUCENE-1899: Fix O(N^2) CPU cost when setting docIDs in order in an
2334 OpenBitSet, due to an inefficiency in how the underlying storage is
2335 reallocated. (Nadav Har'El via Mike McCandless)
2337 * LUCENE-1918: Fixed cases where a ParallelReader would
2338 generate exceptions on being passed to
2339 IndexWriter.addIndexes(IndexReader[]). First case was when the
2340 ParallelReader was empty. Second case was when the ParallelReader
2341 used to contain documents with TermVectors, but all such documents
2342 have been deleted. (Christian KohlschĂĽtter via Mike McCandless)
2346 * LUCENE-1411: Added expert API to open an IndexWriter on a prior
2347 commit, obtained from IndexReader.listCommits. This makes it
2348 possible to rollback changes to an index even after you've closed
2349 the IndexWriter that made the changes, assuming you are using an
2350 IndexDeletionPolicy that keeps past commits around. This is useful
2351 when building transactional support on top of Lucene. (Mike
2354 * LUCENE-1382: Add an optional arbitrary Map (String -> String)
2355 "commitUserData" to IndexWriter.commit(), which is stored in the
2356 segments file and is then retrievable via
2357 IndexReader.getCommitUserData instance and static methods.
2358 (Shalin Shekhar Mangar via Mike McCandless)
2360 * LUCENE-1420: Similarity now has a computeNorm method that allows
2361 custom Similarity classes to override how norm is computed. It's
2362 provided a FieldInvertState instance that contains details from
2363 inverting the field. The default impl is boost *
2364 lengthNorm(numTerms), to be backwards compatible. Also added
2365 {set/get}DiscountOverlaps to DefaultSimilarity, to control whether
2366 overlapping tokens (tokens with 0 position increment) should be
2367 counted in lengthNorm. (Andrzej Bialecki via Mike McCandless)
2369 * LUCENE-1424: Moved constant score query rewrite capability into
2370 MultiTermQuery, allowing TermRangeQuery, PrefixQuery and WildcardQuery
2371 to switch between constant-score rewriting or BooleanQuery
2372 expansion rewriting via a new setRewriteMethod method.
2373 Deprecated ConstantScoreRangeQuery (Mark Miller via Mike
2376 * LUCENE-1461: Added FieldCacheRangeFilter, a RangeFilter for
2377 single-term fields that uses FieldCache to compute the filter. If
2378 your documents all have a single term for a given field, and you
2379 need to create many RangeFilters with varying lower/upper bounds,
2380 then this is likely a much faster way to create the filters than
2381 RangeFilter. FieldCacheRangeFilter allows ranges on all data types,
2382 FieldCache supports (term ranges, byte, short, int, long, float, double).
2383 However, it comes at the expense of added RAM consumption and slower
2384 first-time usage due to populating the FieldCache. It also does not
2385 support collation (Tim Sturge, Matt Ericson via Mike McCandless and
2388 * LUCENE-1296: add protected method CachingWrapperFilter.docIdSetToCache
2389 to allow subclasses to choose which DocIdSet implementation to use
2390 (Paul Elschot via Mike McCandless)
2392 * LUCENE-1390: Added ASCIIFoldingFilter, a Filter that converts
2393 alphabetic, numeric, and symbolic Unicode characters which are not in
2394 the first 127 ASCII characters (the "Basic Latin" Unicode block) into
2395 their ASCII equivalents, if one exists. ISOLatin1AccentFilter, which
2396 handles a subset of this filter, has been deprecated.
2397 (Andi Vajda, Steven Rowe via Mark Miller)
2399 * LUCENE-1478: Added new SortField constructor allowing you to
2400 specify a custom FieldCache parser to generate numeric values from
2401 terms for a field. (Uwe Schindler via Mike McCandless)
2403 * LUCENE-1528: Add support for Ideographic Space to the queryparser.
2404 (Luis Alves via Michael Busch)
2406 * LUCENE-1487: Added FieldCacheTermsFilter, to filter by multiple
2407 terms on single-valued fields. The filter loads the FieldCache
2408 for the field the first time it's called, and subsequent usage of
2409 that field, even with different Terms in the filter, are fast.
2410 (Tim Sturge, Shalin Shekhar Mangar via Mike McCandless).
2412 * LUCENE-1314: Add clone(), clone(boolean readOnly) and
2413 reopen(boolean readOnly) to IndexReader. Cloning an IndexReader
2414 gives you a new reader which you can make changes to (deletions,
2415 norms) without affecting the original reader. Now, with clone or
2416 reopen you can change the readOnly of the original reader. (Jason
2417 Rutherglen, Mike McCandless)
2419 * LUCENE-1506: Added FilteredDocIdSet, an abstract class which you
2420 subclass to implement the "match" method to accept or reject each
2421 docID. Unlike ChainedFilter (under contrib/misc),
2422 FilteredDocIdSet never requires you to materialize the full
2423 bitset. Instead, match() is called on demand per docID. (John
2424 Wang via Mike McCandless)
2426 * LUCENE-1398: Add ReverseStringFilter to contrib/analyzers, a filter
2427 to reverse the characters in each token. (Koji Sekiguchi via yonik)
2429 * LUCENE-1551: Add expert IndexReader.reopen(IndexCommit) to allow
2430 efficiently opening a new reader on a specific commit, sharing
2431 resources with the original reader. (Torin Danil via Mike
2434 * LUCENE-1434: Added org.apache.lucene.util.IndexableBinaryStringTools,
2435 to encode byte[] as String values that are valid terms, and
2436 maintain sort order of the original byte[] when the bytes are
2437 interpreted as unsigned. (Steven Rowe via Mike McCandless)
2439 * LUCENE-1543: Allow MatchAllDocsQuery to optionally use norms from
2440 a specific fields to set the score for a document. (Karl Wettin
2441 via Mike McCandless)
2443 * LUCENE-1586: Add IndexReader.getUniqueTermCount(). (Mike
2444 McCandless via Derek)
2446 * LUCENE-1516: Added "near real-time search" to IndexWriter, via a
2447 new expert getReader() method. This method returns a reader that
2448 searches the full index, including any uncommitted changes in the
2449 current IndexWriter session. This should result in a faster
2450 turnaround than the normal approach of commiting the changes and
2451 then reopening a reader. (Jason Rutherglen via Mike McCandless)
2453 * LUCENE-1603: Added new MultiTermQueryWrapperFilter, to wrap any
2454 MultiTermQuery as a Filter. Also made some improvements to
2455 MultiTermQuery: return DocIdSet.EMPTY_DOCIDSET if there are no
2456 terms in the enum; track the total number of terms it visited
2457 during rewrite (getTotalNumberOfTerms). FilteredTermEnum is also
2458 more friendly to subclassing. (Uwe Schindler via Mike McCandless)
2460 * LUCENE-1605: Added BitVector.subset(). (Jeremy Volkman via Mike
2463 * LUCENE-1618: Added FileSwitchDirectory that enables files with
2464 specified extensions to be stored in a primary directory and the
2465 rest of the files to be stored in the secondary directory. For
2466 example, this can be useful for the large doc-store (stored
2467 fields, term vectors) files in FSDirectory and the rest of the
2468 index files in a RAMDirectory. (Jason Rutherglen via Mike
2471 * LUCENE-1494: Added FieldMaskingSpanQuery which can be used to
2472 cross-correlate Spans from different fields.
2473 (Paul Cowan and Chris Hostetter)
2475 * LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
2476 deletions into account when considering merges. (Yasuhiro Matsuda
2477 via Mike McCandless)
2479 * LUCENE-1550: Added new n-gram based String distance measure for spell checking.
2480 See the Javadocs for NGramDistance.java for a reference paper on why
2481 this is helpful (Tom Morton via Grant Ingersoll)
2483 * LUCENE-1470, LUCENE-1582, LUCENE-1602, LUCENE-1673, LUCENE-1701, LUCENE-1712:
2484 Added NumericRangeQuery and NumericRangeFilter, a fast alternative to
2485 RangeQuery/RangeFilter for numeric searches. They depend on a specific
2486 structure of terms in the index that can be created by indexing
2487 using the new NumericField or NumericTokenStream classes. NumericField
2488 can only be used for indexing and optionally stores the values as
2489 string representation in the doc store. Documents returned from
2490 IndexReader/IndexSearcher will return only the String value using
2491 the standard Fieldable interface. NumericFields can be sorted on
2492 and loaded into the FieldCache. (Uwe Schindler, Yonik Seeley,
2495 * LUCENE-1405: Added support for Ant resource collections in contrib/ant
2496 <index> task. (Przemyslaw Sztoch via Erik Hatcher)
2498 * LUCENE-1699: Allow setting a TokenStream on Field/Fieldable for indexing
2499 in conjunction with any other ways to specify stored field values,
2500 currently binary or string values. (yonik)
2502 * LUCENE-1701: Made the standard FieldCache.Parsers public and added
2503 parsers for fields generated using NumericField/NumericTokenStream.
2504 All standard parsers now also implement Serializable and enforce
2505 their singleton status. (Uwe Schindler, Mike McCandless)
2507 * LUCENE-1741: User configurable maximum chunk size in MMapDirectory.
2508 On 32 bit platforms, the address space can be very fragmented, so
2509 one big ByteBuffer for the whole file may not fit into address space.
2510 (Eks Dev via Uwe Schindler)
2512 * LUCENE-1644: Enable 4 rewrite modes for queries deriving from
2513 MultiTermQuery (WildcardQuery, PrefixQuery, TermRangeQuery,
2514 NumericRangeQuery): CONSTANT_SCORE_FILTER_REWRITE first creates a
2515 filter and then assigns constant score (boost) to docs;
2516 CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE create a BooleanQuery but
2517 uses a constant score (boost); SCORING_BOOLEAN_QUERY_REWRITE also
2518 creates a BooleanQuery but keeps the BooleanQuery's scores;
2519 CONSTANT_SCORE_AUTO_REWRITE tries to pick the most performant
2520 constant-score rewrite method. (Mike McCandless)
2522 * LUCENE-1448: Added TokenStream.end(), to perform end-of-stream
2523 operations. This is currently used to fix offset problems when
2524 multiple fields with the same name are added to a document.
2525 (Mike McCandless, Mark Miller, Michael Busch)
2527 * LUCENE-1776: Add an option to not collect payloads for an ordered
2528 SpanNearQuery. Payloads were not lazily loaded in this case as
2529 the javadocs implied. If you have payloads and want to use an ordered
2530 SpanNearQuery that does not need to use the payloads, you can
2531 disable loading them with a new constructor switch. (Mark Miller)
2533 * LUCENE-1341: Added PayloadNearQuery to enable SpanNearQuery functionality
2534 with payloads (Peter Keegan, Grant Ingersoll, Mark Miller)
2536 * LUCENE-1790: Added PayloadTermQuery to enable scoring of payloads
2537 based on the maximum payload seen for a document.
2538 Slight refactoring of Similarity and other payload queries (Grant Ingersoll, Mark Miller)
2540 * LUCENE-1749: Addition of FieldCacheSanityChecker utility, and
2541 hooks to use it in all existing Lucene Tests. This class can
2542 be used by any application to inspect the FieldCache and provide
2543 diagnostic information about the possibility of inconsistent
2544 FieldCache usage. Namely: FieldCache entries for the same field
2545 with different datatypes or parsers; and FieldCache entries for
2546 the same field in both a reader, and one of it's (descendant) sub
2548 (Chris Hostetter, Mark Miller)
2550 * LUCENE-1789: Added utility class
2551 oal.search.function.MultiValueSource to ease the transition to
2552 segment based searching for any apps that directly call
2553 oal.search.function.* APIs. This class wraps any other
2554 ValueSource, but takes care when composite (multi-segment) are
2555 passed to not double RAM usage in the FieldCache. (Chris
2556 Hostetter, Mark Miller, Mike McCandless)
2560 * LUCENE-1427: Fixed QueryWrapperFilter to not waste time computing
2561 scores of the query, since they are just discarded. Also, made it
2562 more efficient (single pass) by not creating & populating an
2563 intermediate OpenBitSet (Paul Elschot, Mike McCandless)
2565 * LUCENE-1443: Performance improvement for OpenBitSetDISI.inPlaceAnd()
2566 (Paul Elschot via yonik)
2568 * LUCENE-1484: Remove synchronization of IndexReader.document() by
2569 using CloseableThreadLocal internally. (Jason Rutherglen via Mike
2572 * LUCENE-1124: Short circuit FuzzyQuery.rewrite when input token length
2573 is small compared to minSimilarity. (Timo Nentwig, Mark Miller)
2575 * LUCENE-1316: MatchAllDocsQuery now avoids the synchronized
2576 IndexReader.isDeleted() call per document, by directly accessing
2577 the underlying deleteDocs BitVector. This improves performance
2578 with non-readOnly readers, especially in a multi-threaded
2579 environment. (Todd Feak, Yonik Seeley, Jason Rutherglen via Mike
2582 * LUCENE-1483: When searching over multiple segments we now visit
2583 each sub-reader one at a time. This speeds up warming, since
2584 FieldCache entries (if required) can be shared across reopens for
2585 those segments that did not change, and also speeds up searches
2586 that sort by relevance or by field values. (Mark Miller, Mike
2589 * LUCENE-1575: The new Collector class decouples collect() from
2590 score computation. Collector.setScorer is called to establish the
2591 current Scorer in-use per segment. Collectors that require the
2592 score should then call Scorer.score() per hit inside
2593 collect(). (Shai Erera via Mike McCandless)
2595 * LUCENE-1596: MultiTermDocs speedup when set with
2596 MultiTermDocs.seek(MultiTermEnum) (yonik)
2598 * LUCENE-1653: Avoid creating a Calendar in every call to
2599 DateTools#dateToString, DateTools#timeToString and
2600 DateTools#round. (Shai Erera via Mark Miller)
2602 * LUCENE-1688: Deprecate static final String stop word array and
2603 replace it with an immutable implementation of CharArraySet.
2604 Removes conversions between Set and array.
2605 (Simon Willnauer via Mark Miller)
2607 * LUCENE-1754: BooleanQuery.queryWeight.scorer() will return null if
2608 it won't match any documents (e.g. if there are no required and
2609 optional scorers, or not enough optional scorers to satisfy
2610 minShouldMatch). (Shai Erera via Mike McCandless)
2612 * LUCENE-1607: To speed up string interning for commonly used
2613 strings, the StringHelper.intern() interface was added with a
2614 default implementation that uses a lockless cache.
2615 (Earwin Burrfoot, yonik)
2617 * LUCENE-1800: QueryParser should use reusable TokenStreams. (yonik)
2622 * LUCENE-1908: Scoring documentation imrovements in Similarity javadocs.
2623 (Mark Miller, Shai Erera, Ted Dunning, Jiri Kuhn, Marvin Humphrey, Doron Cohen)
2625 * LUCENE-1872: NumericField javadoc improvements
2626 (Michael McCandless, Uwe Schindler)
2628 * LUCENE-1875: Make TokenStream.end javadoc less confusing.
2631 * LUCENE-1862: Rectified duplicate package level javadocs for
2632 o.a.l.queryParser and o.a.l.analysis.cn.
2635 * LUCENE-1886: Improved hyperlinking in key Analysis javadocs
2636 (Bernd Fondermann via Chris Hostetter)
2638 * LUCENE-1884: massive javadoc and comment cleanup, primarily dealing with
2640 (Robert Muir via Chris Hostetter)
2642 * LUCENE-1898: Switch changes to use bullets rather than numbers and
2643 update changes-to-html script to handle the new format.
2644 (Steven Rowe, Mark Miller)
2646 * LUCENE-1900: Improve Searchable Javadoc.
2647 (Nadav Har'El, Doron Cohen, Marvin Humphrey, Mark Miller)
2649 * LUCENE-1896: Improve Similarity#queryNorm javadocs.
2650 (Jiri Kuhn, Mark Miller)
2654 * LUCENE-1440: Add new targets to build.xml that allow downloading
2655 and executing the junit testcases from an older release for
2656 backwards-compatibility testing. (Michael Busch)
2658 * LUCENE-1446: Add compatibility tag to common-build.xml and run
2659 backwards-compatibility tests in the nightly build. (Michael Busch)
2661 * LUCENE-1529: Properly test "drop-in" replacement of jar with
2662 backwards-compatibility tests. (Mike McCandless, Michael Busch)
2664 * LUCENE-1851: Change 'javacc' and 'clean-javacc' targets to build
2665 and clean contrib/surround files. (Luis Alves via Michael Busch)
2667 * LUCENE-1854: tar task should use longfile="gnu" to avoid false file
2668 name length warnings. (Mark Miller)
2672 * LUCENE-1791: Enhancements to the QueryUtils and CheckHits utility
2673 classes to wrap IndexReaders and Searchers in MultiReaders or
2674 MultiSearcher when possible to help exercise more edge cases.
2675 (Chris Hostetter, Mark Miller)
2677 * LUCENE-1852: Fix localization test failures.
2678 (Robert Muir via Michael Busch)
2680 * LUCENE-1843: Refactored all tests that use assertAnalyzesTo() & others
2681 in core and contrib to use a new BaseTokenStreamTestCase
2682 base class. Also rewrote some tests to use this general analysis assert
2683 functions instead of own ones (e.g. TestMappingCharFilter).
2684 The new base class also tests tokenization with the TokenStream.next()
2685 backwards layer enabled (using Token/TokenWrapper as attribute
2686 implementation) and disabled (default for Lucene 3.0)
2687 (Uwe Schindler, Robert Muir)
2689 * LUCENE-1836: Added a new LocalizedTestCase as base class for localization
2690 junit tests. (Robert Muir, Uwe Schindler via Michael Busch)
2692 ======================= Release 2.4.1 =======================
2696 1. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
2697 resources. (Christian KohlschĂĽtter via Mike McCandless)
2701 1. LUCENE-1452: Fixed silent data-loss case whereby binary fields are
2702 truncated to 0 bytes during merging if the segments being merged
2703 are non-congruent (same field name maps to different field
2704 numbers). This bug was introduced with LUCENE-1219. (Andrzej
2705 Bialecki via Mike McCandless).
2707 2. LUCENE-1429: Don't throw incorrect IllegalStateException from
2708 IndexWriter.close() if you've hit an OOM when autoCommit is true.
2711 3. LUCENE-1474: If IndexReader.flush() is called twice when there were
2712 pending deletions, it could lead to later false AssertionError
2713 during IndexReader.open. (Mike McCandless)
2715 4. LUCENE-1430: Fix false AlreadyClosedException from IndexReader.open
2716 (masking an actual IOException) that takes String or File path.
2719 5. LUCENE-1442: Multiple-valued NOT_ANALYZED fields can double-count
2720 token offsets. (Mike McCandless)
2722 6. LUCENE-1453: Ensure IndexReader.reopen()/clone() does not result in
2723 incorrectly closing the shared FSDirectory. This bug would only
2724 happen if you use IndexReader.open() with a File or String argument.
2725 The returned readers are wrapped by a FilterIndexReader that
2726 correctly handles closing of directory after reopen()/clone().
2727 (Mark Miller, Uwe Schindler, Mike McCandless)
2729 7. LUCENE-1457: Fix possible overflow bugs during binary
2730 searches. (Mark Miller via Mike McCandless)
2732 8. LUCENE-1459: Fix CachingWrapperFilter to not throw exception if
2733 both bits() and getDocIdSet() methods are called. (Matt Jones via
2736 9. LUCENE-1519: Fix int overflow bug during segment merging. (Deepak
2737 via Mike McCandless)
2739 10. LUCENE-1521: Fix int overflow bug when flushing segment.
2740 (Shon Vella via Mike McCandless).
2742 11. LUCENE-1544: Fix deadlock in IndexWriter.addIndexes(IndexReader[]).
2743 (Mike McCandless via Doug Sale)
2745 12. LUCENE-1547: Fix rare thread safety issue if two threads call
2746 IndexWriter commit() at the same time. (Mike McCandless)
2748 13. LUCENE-1465: NearSpansOrdered returns payloads from first possible match
2749 rather than the correct, shortest match; Payloads could be returned even
2750 if the max slop was exceeded; The wrong payload could be returned in
2751 certain situations. (Jonathan Mamou, Greg Shackles, Mark Miller)
2753 14. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
2754 resources. (Christian KohlschĂĽtter via Mike McCandless)
2756 15. LUCENE-1552: Fix IndexWriter.addIndexes(IndexReader[]) to properly
2757 rollback IndexWriter's internal state on hitting an
2758 exception. (Scott Garland via Mike McCandless)
2760 ======================= Release 2.4.0 =======================
2762 Changes in backwards compatibility policy
2764 1. LUCENE-1340: In a minor change to Lucene's backward compatibility
2765 policy, we are now allowing the Fieldable interface to have
2766 changes, within reason, and made on a case-by-case basis. If an
2767 application implements it's own Fieldable, please be aware of
2768 this. Otherwise, no need to be concerned. This is in effect for
2769 all 2.X releases, starting with 2.4. Also note, that in all
2770 likelihood, Fieldable will be changed in 3.0.
2773 Changes in runtime behavior
2775 1. LUCENE-1151: Fix StandardAnalyzer to not mis-identify host names
2776 (eg lucene.apache.org) as an ACRONYM. To get back to the pre-2.4
2777 backwards compatible, but buggy, behavior, you can either call
2778 StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static
2779 method), or, set system property
2780 org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
2781 to "false" on JVM startup. All StandardAnalyzer instances created
2782 after that will then show the pre-2.4 behavior. Alternatively,
2783 you can call setReplaceInvalidAcronym(false) to change the
2784 behavior per instance of StandardAnalyzer. This backwards
2785 compatibility will be removed in 3.0 (hardwiring the value to
2786 true). (Mike McCandless)
2788 2. LUCENE-1044: IndexWriter with autoCommit=true now commits (such
2789 that a reader can see the changes) far less often than it used to.
2790 Previously, every flush was also a commit. You can always force a
2791 commit by calling IndexWriter.commit(). Furthermore, in 3.0,
2792 autoCommit will be hardwired to false (IndexWriter constructors
2793 that take an autoCommit argument have been deprecated) (Mike
2796 3. LUCENE-1335: IndexWriter.addIndexes(Directory[]) and
2797 addIndexesNoOptimize no longer allow the same Directory instance
2798 to be passed in more than once. Internally, IndexWriter uses
2799 Directory and segment name to uniquely identify segments, so
2800 adding the same Directory more than once was causing duplicates
2801 which led to problems (Mike McCandless)
2803 4. LUCENE-1396: Improve PhraseQuery.toString() so that gaps in the
2804 positions are indicated with a ? and multiple terms at the same
2805 position are joined with a |. (Andrzej Bialecki via Mike
2810 1. LUCENE-1084: Changed all IndexWriter constructors to take an
2811 explicit parameter for maximum field size. Deprecated all the
2812 pre-existing constructors; these will be removed in release 3.0.
2813 NOTE: these new constructors set autoCommit to false. (Steven
2814 Rowe via Mike McCandless)
2816 2. LUCENE-584: Changed Filter API to return a DocIdSet instead of a
2817 java.util.BitSet. This allows using more efficient data structures
2818 for Filters and makes them more flexible. This deprecates
2819 Filter.bits(), so all filters that implement this outside
2820 the Lucene code base will need to be adapted. See also the javadocs
2821 of the Filter class. (Paul Elschot, Michael Busch)
2823 3. LUCENE-1044: Added IndexWriter.commit() which flushes any buffered
2824 adds/deletes and then commits a new segments file so readers will
2825 see the changes. Deprecate IndexWriter.flush() in favor of
2826 IndexWriter.commit(). (Mike McCandless)
2828 4. LUCENE-325: Added IndexWriter.expungeDeletes methods, which
2829 consult the MergePolicy to find merges necessary to merge away all
2830 deletes from the index. This should be a somewhat lower cost
2831 operation than optimize. (John Wang via Mike McCandless)
2833 5. LUCENE-1233: Return empty array instead of null when no fields
2834 match the specified name in these methods in Document:
2835 getFieldables, getFields, getValues, getBinaryValues. (Stefan
2836 Trcek vai Mike McCandless)
2838 6. LUCENE-1234: Make BoostingSpanScorer protected. (Andi Vajda via Grant Ingersoll)
2840 7. LUCENE-510: The index now stores strings as true UTF-8 bytes
2841 (previously it was Java's modified UTF-8). If any text, either
2842 stored fields or a token, has illegal UTF-16 surrogate characters,
2843 these characters are now silently replaced with the Unicode
2844 replacement character U+FFFD. This is a change to the index file
2845 format. (Marvin Humphrey via Mike McCandless)
2847 8. LUCENE-852: Let the SpellChecker caller specify IndexWriter mergeFactor
2848 and RAM buffer size. (Otis Gospodnetic)
2850 9. LUCENE-1290: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator
2851 and remove all references to these classes from the core. Also update demos
2852 and tutorials. (Michael Busch)
2854 10. LUCENE-1288: Add getVersion() and getGeneration() to IndexCommit.
2855 getVersion() returns the same value that IndexReader.getVersion()
2856 returns when the reader is opened on the same commit. (Jason
2857 Rutherglen via Mike McCandless)
2859 11. LUCENE-1311: Added IndexReader.listCommits(Directory) static
2860 method to list all commits in a Directory, plus IndexReader.open
2861 methods that accept an IndexCommit and open the index as of that
2862 commit. These methods are only useful if you implement a custom
2863 DeletionPolicy that keeps more than the last commit around.
2864 (Jason Rutherglen via Mike McCandless)
2866 12. LUCENE-1325: Added IndexCommit.isOptimized(). (Shalin Shekhar
2867 Mangar via Mike McCandless)
2869 13. LUCENE-1324: Added TokenFilter.reset(). (Shai Erera via Mike
2872 14. LUCENE-1340: Added Fieldable.omitTf() method to skip indexing term
2873 frequency, positions and payloads. This saves index space, and
2874 indexing/searching time. (Eks Dev via Mike McCandless)
2876 15. LUCENE-1219: Add basic reuse API to Fieldable for binary fields:
2877 getBinaryValue/Offset/Length(); currently only lazy fields reuse
2878 the provided byte[] result to getBinaryValue. (Eks Dev via Mike
2881 16. LUCENE-1334: Add new constructor for Term: Term(String fieldName)
2882 which defaults term text to "". (DM Smith via Mike McCandless)
2884 17. LUCENE-1333: Added Token.reinit(*) APIs to re-initialize (reuse) a
2885 Token. Also added term() method to return a String, with a
2886 performance penalty clearly documented. Also implemented
2887 hashCode() and equals() in Token, and fixed all core and contrib
2888 analyzers to use the re-use APIs. (DM Smith via Mike McCandless)
2890 18. LUCENE-1329: Add optional readOnly boolean when opening an
2891 IndexReader. A readOnly reader is not allowed to make changes
2892 (deletions, norms) to the index; in exchanged, the isDeleted
2893 method, often a bottleneck when searching with many threads, is
2894 not synchronized. The default for readOnly is still false, but in
2895 3.0 the default will become true. (Jason Rutherglen via Mike
2898 19. LUCENE-1367: Add IndexCommit.isDeleted(). (Shalin Shekhar Mangar
2899 via Mike McCandless)
2901 20. LUCENE-1061: Factored out all "new XXXQuery(...)" in
2902 QueryParser.java into protected methods newXXXQuery(...) so that
2903 subclasses can create their own subclasses of each Query type.
2904 (John Wang via Mike McCandless)
2906 21. LUCENE-753: Added new Directory implementation
2907 org.apache.lucene.store.NIOFSDirectory, which uses java.nio's
2908 FileChannel to do file reads. On most non-Windows platforms, with
2909 many threads sharing a single searcher, this may yield sizable
2910 improvement to query throughput when compared to FSDirectory,
2911 which only allows a single thread to read from an open file at a
2912 time. (Jason Rutherglen via Mike McCandless)
2914 22. LUCENE-1371: Added convenience method TopDocs Searcher.search(Query query, int n).
2917 23. LUCENE-1356: Allow easy extensions of TopDocCollector by turning
2918 constructor and fields from package to protected. (Shai Erera
2921 24. LUCENE-1375: Added convenience method IndexCommit.getTimestamp,
2922 which is equivalent to
2923 getDirectory().fileModified(getSegmentsFileName()). (Mike McCandless)
2925 23. LUCENE-1366: Rename Field.Index options to be more accurate:
2926 TOKENIZED becomes ANALYZED; UN_TOKENIZED becomes NOT_ANALYZED;
2927 NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS
2928 is added. (Mike McCandless)
2930 24. LUCENE-1131: Added numDeletedDocs method to IndexReader (Otis Gospodnetic)
2934 1. LUCENE-1134: Fixed BooleanQuery.rewrite to only optimize a single
2935 clause query if minNumShouldMatch<=0. (Shai Erera via Michael Busch)
2937 2. LUCENE-1169: Fixed bug in IndexSearcher.search(): searching with
2938 a filter might miss some hits because scorer.skipTo() is called
2939 without checking if the scorer is already at the right position.
2940 scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as
2941 scorer.next(). (Eks Dev, Michael Busch)
2943 3. LUCENE-1182: Added scorePayload to SimilarityDelegator (Andi Vajda via Grant Ingersoll)
2945 4. LUCENE-1213: MultiFieldQueryParser was ignoring slop in case
2946 of a single field phrase. (Trejkaz via Doron Cohen)
2948 5. LUCENE-1228: IndexWriter.commit() was not updating the index version and as
2949 result IndexReader.reopen() failed to sense index changes. (Doron Cohen)
2951 6. LUCENE-1267: Added numDocs() and maxDoc() to IndexWriter;
2952 deprecated docCount(). (Mike McCandless)
2954 7. LUCENE-1274: Added new prepareCommit() method to IndexWriter,
2955 which does phase 1 of a 2-phase commit (commit() does phase 2).
2956 This is needed when you want to update an index as part of a
2957 transaction involving external resources (eg a database). Also
2958 deprecated abort(), renaming it to rollback(). (Mike McCandless)
2960 8. LUCENE-1003: Stop RussianAnalyzer from removing numbers.
2961 (TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
2963 9. LUCENE-1152: SpellChecker fix around clearIndex and indexDictionary
2964 methods, plus removal of IndexReader reference.
2965 (Naveen Belkale via Otis Gospodnetic)
2967 10. LUCENE-1046: Removed dead code in SpellChecker
2968 (Daniel Naber via Otis Gospodnetic)
2970 11. LUCENE-1189: Fixed the QueryParser to handle escaped characters within
2971 quoted terms correctly. (Tomer Gabel via Michael Busch)
2973 12. LUCENE-1299: Fixed NPE in SpellChecker when IndexReader is not null and field is (Grant Ingersoll)
2975 13. LUCENE-1303: Fixed BoostingTermQuery's explanation to be marked as a Match
2976 depending only upon the non-payload score part, regardless of the effect of
2977 the payload on the score. Prior to this, score of a query containing a BTQ
2978 differed from its explanation. (Doron Cohen)
2980 14. LUCENE-1310: Fixed SloppyPhraseScorer to work also for terms repeating more
2981 than twice in the query. (Doron Cohen)
2983 15. LUCENE-1351: ISOLatin1AccentFilter now cleans additional ligatures (Cedrik Lime via Grant Ingersoll)
2985 16. LUCENE-1383: Workaround a nasty "leak" in Java's builtin
2986 ThreadLocal, to prevent Lucene from causing unexpected
2987 OutOfMemoryError in certain situations (notably J2EE
2988 applications). (Chris Lu via Mike McCandless)
2992 1. LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis
2993 process. The flag is not indexed/stored and is thus only used by analysis.
2995 2. LUCENE-1147: Add -segment option to CheckIndex tool so you can
2996 check only a specific segment or segments in your index. (Mike
2999 3. LUCENE-1045: Reopened this issue to add support for short and bytes.
3001 4. LUCENE-584: Added new data structures to o.a.l.util, such as
3002 OpenBitSet and SortedVIntList. These extend DocIdSet and can
3003 directly be used for Filters with the new Filter API. Also changed
3004 the core Filters to use OpenBitSet instead of java.util.BitSet.
3005 (Paul Elschot, Michael Busch)
3007 5. LUCENE-494: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms.
3008 This Analyzer is not intended for use during indexing. (Mark Harwood via Grant Ingersoll)
3010 6. LUCENE-1044: Change Lucene to properly "sync" files after
3011 committing, to ensure on a machine or OS crash or power cut, even
3012 with cached writes, the index remains consistent. Also added
3013 explicit commit() method to IndexWriter to force a commit without
3014 having to close. (Mike McCandless)
3016 7. LUCENE-997: Add search timeout (partial) support.
3017 A TimeLimitedCollector was added to allow limiting search time.
3018 It is a partial solution since timeout is checked only when
3019 collecting a hit, and therefore a search for rare words in a
3020 huge index might not stop within the specified time.
3021 (Sean Timm via Doron Cohen)
3023 8. LUCENE-1184: Allow SnapshotDeletionPolicy to be re-used across
3024 close/re-open of IndexWriter while still protecting an open
3025 snapshot (Tim Brennan via Mike McCandless)
3027 9. LUCENE-1194: Added IndexWriter.deleteDocuments(Query) to delete
3028 documents matching the specified query. Also added static unlock
3029 and isLocked methods (deprecating the ones in IndexReader). (Mike
3032 10. LUCENE-1201: Add IndexReader.getIndexCommit() method. (Tim Brennan
3033 via Mike McCandless)
3035 11. LUCENE-550: Added InstantiatedIndex implementation. Experimental
3036 Index store similar to MemoryIndex but allows for multiple documents
3037 in memory. (Karl Wettin via Grant Ingersoll)
3039 12. LUCENE-400: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper
3040 that wraps another Analyzer's token stream with a ShingleFilter (Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
3042 13. LUCENE-1166: Decomposition tokenfilter for languages like German and Swedish (Thomas Peuss via Grant Ingersoll)
3044 14. LUCENE-1187: ChainedFilter and BooleanFilter now work with new Filter API
3045 and DocIdSetIterator-based filters. Backwards-compatibility with old
3046 BitSet-based filters is ensured. (Paul Elschot via Michael Busch)
3048 15. LUCENE-1295: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public. (Grant Ingersoll)
3050 16. LUCENE-1298: MoreLikeThis can now accept a custom Similarity (Grant Ingersoll)
3052 17. LUCENE-1297: Allow other string distance measures for the SpellChecker
3053 (Thomas Morton via Otis Gospodnetic)
3055 18. LUCENE-1001: Provide access to Payloads via Spans. All existing Span Query implementations in Lucene implement. (Mark Miller, Grant Ingersoll)
3057 19. LUCENE-1354: Provide programmatic access to CheckIndex (Grant Ingersoll, Mike McCandless)
3059 20. LUCENE-1279: Add support for Collators to RangeFilter/Query and Query Parser. (Steve Rowe via Grant Ingersoll)
3063 1. LUCENE-705: When building a compound file, use
3064 RandomAccessFile.setLength() to tell the OS/filesystem to
3065 pre-allocate space for the file. This may improve fragmentation
3066 in how the CFS file is stored, and allows us to detect an upcoming
3067 disk full situation before actually filling up the disk. (Mike
3070 2. LUCENE-1120: Speed up merging of term vectors by bulk-copying the
3071 raw bytes for each contiguous range of non-deleted documents.
3074 3. LUCENE-1185: Avoid checking if the TermBuffer 'scratch' in
3075 SegmentTermEnum is null for every call of scanTo().
3076 (Christian Kohlschuetter via Michael Busch)
3078 4. LUCENE-1217: Internal to Field.java, use isBinary instead of
3079 runtime type checking for possible speedup of binaryValue().
3080 (Eks Dev via Mike McCandless)
3082 5. LUCENE-1183: Optimized TRStringDistance class (in contrib/spell) that uses
3083 less memory than the previous version. (CĂ©drik LIME via Otis Gospodnetic)
3085 6. LUCENE-1195: Improve term lookup performance by adding a LRU cache to the
3086 TermInfosReader. In performance experiments the speedup was about 25% on
3087 average on mid-size indexes with ~500,000 documents for queries with 3
3088 terms and about 7% on larger indexes with ~4.3M documents. (Michael Busch)
3092 1. LUCENE-1236: Added some clarifying remarks to EdgeNGram*.java (Hiroaki Kawai via Grant Ingersoll)
3094 2. LUCENE-1157 and LUCENE-1256: HTML changes log, created automatically
3095 from CHANGES.txt. This HTML file is currently visible only via developers page.
3096 (Steven Rowe via Doron Cohen)
3098 3. LUCENE-1349: Fieldable can now be changed without breaking backward compatibility rules (within reason. See the note at
3099 the top of this file and also on Fieldable.java). (Grant Ingersoll)
3101 4. LUCENE-1873: Update documentation to reflect current Contrib area status.
3102 (Steven Rowe, Mark Miller)
3106 1. LUCENE-1153: Added JUnit JAR to new lib directory. Updated build to rely on local JUnit instead of ANT/lib.
3108 2. LUCENE-1202: Small fixes to the way Clover is used to work better
3109 with contribs. Of particular note: a single clover db is used
3110 regardless of whether tests are run globally or in the specific
3111 contrib directories.
3113 3. LUCENE-1353: Javacc target in contrib/miscellaneous for
3114 generating the precedence query parser.
3118 1. LUCENE-1238: Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded.
3119 Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped
3120 collector to collect also the last doc, after allowed-tTime passed. (Doron Cohen)
3122 2. LUCENE-1348: relax TestTimeLimitedCollector to not fail due to
3123 timeout exceeded (just because test machine is very busy).
3125 ======================= Release 2.3.2 =======================
3129 1. LUCENE-1191: On hitting OutOfMemoryError in any index-modifying
3130 methods in IndexWriter, do not commit any further changes to the
3131 index to prevent risk of possible corruption. (Mike McCandless)
3133 2. LUCENE-1197: Fixed issue whereby IndexWriter would flush by RAM
3134 too early when TermVectors were in use. (Mike McCandless)
3136 3. LUCENE-1198: Don't corrupt index if an exception happens inside
3137 DocumentsWriter.init (Mike McCandless)
3139 4. LUCENE-1199: Added defensive check for null indexReader before
3140 calling close in IndexModifier.close() (Mike McCandless)
3142 5. LUCENE-1200: Fix rare deadlock case in addIndexes* when
3143 ConcurrentMergeScheduler is in use (Mike McCandless)
3145 6. LUCENE-1208: Fix deadlock case on hitting an exception while
3146 processing a document that had triggered a flush (Mike McCandless)
3148 7. LUCENE-1210: Fix deadlock case on hitting an exception while
3149 starting a merge when using ConcurrentMergeScheduler (Mike McCandless)
3151 8. LUCENE-1222: Fix IndexWriter.doAfterFlush to always be called on
3152 flush (Mark Ferguson via Mike McCandless)
3154 9. LUCENE-1226: Fixed IndexWriter.addIndexes(IndexReader[]) to commit
3155 successfully created compound files. (Michael Busch)
3157 10. LUCENE-1150: Re-expose StandardTokenizer's constants publicly;
3158 this was accidentally lost with LUCENE-966. (Nicolas Lalevée via
3161 11. LUCENE-1262: Fixed bug in BufferedIndexReader.refill whereby on
3162 hitting an exception in readInternal, the buffer is incorrectly
3163 filled with stale bytes such that subsequent calls to readByte()
3164 return incorrect results. (Trejkaz via Mike McCandless)
3166 12. LUCENE-1270: Fixed intermittent case where IndexWriter.close()
3167 would hang after IndexWriter.addIndexesNoOptimize had been
3168 called. (Stu Hood via Mike McCandless)
3172 1. LUCENE-1230: Include *pom.xml* in source release files. (Michael Busch)
3175 ======================= Release 2.3.1 =======================
3179 1. LUCENE-1168: Fixed corruption cases when autoCommit=false and
3180 documents have mixed term vectors (Suresh Guvvala via Mike
3183 2. LUCENE-1171: Fixed some cases where OOM errors could cause
3184 deadlock in IndexWriter (Mike McCandless).
3186 3. LUCENE-1173: Fixed corruption case when autoCommit=false and bulk
3187 merging of stored fields is used (Yonik via Mike McCandless).
3189 4. LUCENE-1163: Fixed bug in CharArraySet.contains(char[] buffer, int
3190 offset, int len) that was ignoring offset and thus giving the
3191 wrong answer. (Thomas Peuss via Mike McCandless)
3193 5. LUCENE-1177: Fix rare case where IndexWriter.optimize might do too
3194 many merges at the end. (Mike McCandless)
3196 6. LUCENE-1176: Fix corruption case when documents with no term
3197 vector fields are added before documents with term vector fields.
3200 7. LUCENE-1179: Fixed assert statement that was incorrectly
3201 preventing Fields with empty-string field name from working.
3202 (Sergey Kabashnyuk via Mike McCandless)
3204 ======================= Release 2.3.0 =======================
3206 Changes in runtime behavior
3208 1. LUCENE-994: Defaults for IndexWriter have been changed to maximize
3209 out-of-the-box indexing speed. First, IndexWriter now flushes by
3210 RAM usage (16 MB by default) instead of a fixed doc count (call
3211 IndexWriter.setMaxBufferedDocs to get backwards compatible
3212 behavior). Second, ConcurrentMergeScheduler is used to run merges
3213 using background threads (call IndexWriter.setMergeScheduler(new
3214 SerialMergeScheduler()) to get backwards compatible behavior).
3215 Third, merges are chosen based on size in bytes of each segment
3216 rather than document count of each segment (call
3217 IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get
3218 backwards compatible behavior).
3220 NOTE: users of ParallelReader must change back all of these
3221 defaults in order to ensure the docIDs "align" across all parallel
3226 2. LUCENE-1045: SortField.AUTO didn't work with long. When detecting
3227 the field type for sorting automatically, numbers used to be
3228 interpreted as int, then as float, if parsing the number as an int
3229 failed. Now the detection checks for int, then for long,
3230 then for float. (Daniel Naber)
3234 1. LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have
3235 IndexWriter flush whenever the buffered documents are using more
3236 than the specified amount of RAM. Also added new APIs to Token
3237 that allow one to set a char[] plus offset and length to specify a
3238 token (to avoid creating a new String() for each Token). (Mike
3241 2. LUCENE-963: Add setters to Field to allow for re-using a single
3242 Field instance during indexing. This is a sizable performance
3243 gain, especially for small documents. (Mike McCandless)
3245 3. LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
3246 permit re-using of Token and TokenStream instances during
3247 indexing. Changed Token to use a char[] as the store for the
3248 termText instead of String. This gives faster tokenization
3249 performance (~10-15%). (Mike McCandless)
3251 4. LUCENE-847: Factored MergePolicy, which determines which merges
3252 should take place and when, as well as MergeScheduler, which
3253 determines when the selected merges should actually run, out of
3254 IndexWriter. The default merge policy is now
3255 LogByteSizeMergePolicy (see LUCENE-845) and the default merge
3256 scheduler is now ConcurrentMergeScheduler (see
3257 LUCENE-870). (Steven Parkes via Mike McCandless)
3259 5. LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method
3260 that allows you to reduce memory usage of the termInfos by further
3261 sub-sampling (over the termIndexInterval that was used during
3262 indexing) which terms are loaded into memory. (Chuck Williams,
3263 Doug Cutting via Mike McCandless)
3265 6. LUCENE-743: Add IndexReader.reopen() method that re-opens an
3266 existing IndexReader (see New features -> 8.) (Michael Busch)
3268 7. LUCENE-1062: Add setData(byte[] data),
3269 setData(byte[] data, int offset, int length), getData(), getOffset()
3270 and clone() methods to o.a.l.index.Payload. Also add the field name
3271 as arg to Similarity.scorePayload(). (Michael Busch)
3273 8. LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to
3274 "partially optimize" an index down to maxNumSegments segments.
3277 9. LUCENE-1080: Changed Token.DEFAULT_TYPE to be public.
3279 10. LUCENE-1064: Changed TopDocs constructor to be public.
3280 (Shai Erera via Michael Busch)
3282 11. LUCENE-1079: DocValues cleanup: constructor now has no params,
3283 and getInnerArray() now throws UnsupportedOperationException (Doron Cohen)
3285 12. LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns
3286 the Object (if any) that was bumped from the queue to allow
3287 re-use. (Shai Erera via Mike McCandless)
3289 13. LUCENE-1101: Token reuse 'contract' (defined LUCENE-969)
3290 modified so it is token producer's responsibility
3291 to call Token.clear(). (Doron Cohen)
3293 14. LUCENE-1118: Changed StandardAnalyzer to skip too-long (default >
3294 255 characters) tokens. You can increase this limit by calling
3295 StandardAnalyzer.setMaxTokenLength(...). (Michael McCandless)
3300 1. LUCENE-933: QueryParser fixed to not produce empty sub
3301 BooleanQueries "()" even if the Analyzer produced no
3302 tokens for input. (Doron Cohen)
3304 2. LUCENE-955: Fixed SegmentTermPositions to work correctly with the
3305 first term in the dictionary. (Michael Busch)
3307 3. LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader
3308 that was thrown after a call of TermPositions.seek().
3309 (Rich Johnson via Michael Busch)
3311 4. LUCENE-938: Fixed cases where an unhandled exception in
3312 IndexWriter's methods could cause deletes to be lost.
3313 (Steven Parkes via Mike McCandless)
3315 5. LUCENE-962: Fixed case where an unhandled exception in
3316 IndexWriter.addDocument or IndexWriter.updateDocument could cause
3317 unreferenced files in the index to not be deleted
3318 (Steven Parkes via Mike McCandless)
3320 6. LUCENE-957: RAMDirectory fixed to properly handle directories
3321 larger than Integer.MAX_VALUE. (Doron Cohen)
3323 7. LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(),
3324 isOptimized() or getVersion() is called. Separated MultiReader
3325 into two classes: MultiSegmentReader extends IndexReader, is
3326 package-protected and is created automatically by IndexReader.open()
3327 in case the index has multiple segments. The public MultiReader
3328 now extends MultiSegmentReader and is intended to be used by users
3329 who want to add their own subreaders. (Daniel Naber, Michael Busch)
3331 8. LUCENE-970: FilterIndexReader now implements isOptimized(). Before
3332 a call of isOptimized() would throw a NPE. (Michael Busch)
3334 9. LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(),
3335 isOptimized() or getVersion() is called. (Michael Busch)
3337 10. LUCENE-948: Fix FNFE exception caused by stale NFS client
3338 directory listing caches when writers on different machines are
3339 sharing an index over NFS and using a custom deletion policy (Mike
3342 11. LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader
3343 close any streams they had opened if an exception is hit in the
3344 constructor. (Ning Li via Mike McCandless)
3346 12. LUCENE-985: If an extremely long term is in a doc (> 16383 chars),
3347 we now throw an IllegalArgumentException saying the term is too
3348 long, instead of cryptic ArrayIndexOutOfBoundsException. (Karl
3349 Wettin via Mike McCandless)
3351 13. LUCENE-991: The explain() method of BoostingTermQuery had errors
3352 when no payloads were present on a document. (Peter Keegan via
3355 14. LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again
3356 (this was broken by LUCENE-843). (Ning Li via Mike McCandless)
3358 15. LUCENE-1008: Fixed corruption case when document with no term
3359 vector fields is added after documents with term vector fields.
3360 This bug was introduced with LUCENE-843. (Grant Ingersoll via
3363 16. LUCENE-1006: Fixed QueryParser to accept a "" field value (zero
3364 length quoted string.) (yonik)
3366 17. LUCENE-1010: Fixed corruption case when document with no term
3367 vector fields is added after documents with term vector fields.
3368 This case is hit during merge and would cause an EOFException.
3369 This bug was introduced with LUCENE-984. (Andi Vajda via Mike
3372 19. LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when
3373 autoCommit=false and documents are using stored fields and/or term
3374 vectors. (Mark Miller via Mike McCandless)
3376 20. LUCENE-1011: Fixed corruption case when two or more machines,
3377 sharing an index over NFS, can be writers in quick succession.
3378 (Patrick Kimber via Mike McCandless)
3380 21. LUCENE-1028: Fixed Weight serialization for few queries:
3381 DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery.
3382 Serialization check added for all queries.
3383 (Kyle Maxwell via Doron Cohen)
3385 22. LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the
3386 timeout argument is very large (eg Long.MAX_VALUE). Also added
3387 Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout. (Nikolay
3388 Diakov via Mike McCandless)
3390 23. LUCENE-1050: Throw LockReleaseFailedException in
3391 Simple/NativeFSLockFactory if we fail to delete the lock file when
3392 releasing the lock. (Nikolay Diakov via Mike McCandless)
3394 24. LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in
3395 the merged segment. (Michael Busch)
3397 25. LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent
3398 with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent. (Karl Wettin via Grant Ingersoll)
3400 26. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted
3401 along with iterating the hits. Deleting docs already retrieved
3402 now works seamlessly. If docs not yet retrieved are deleted
3403 (e.g. from another thread), and then, relying on the initial
3404 Hits.length(), an application attempts to retrieve more hits
3405 than actually exist , a ConcurrentMidificationException
3406 is thrown. (Doron Cohen)
3408 27. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking
3409 the type of some tokens incorrectly. This is done by adding a new flag named
3410 replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting
3411 this flag to true fixes the problem. This flag is a temporary fix and is already
3412 marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll)
3413 LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)
3415 28. LUCENE-749: ChainedFilter behavior fixed when logic of
3416 first filter is ANDNOT. (Antonio Bruno via Doron Cohen)
3418 29. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last
3419 term) after next() returns false. (Steven Tamm via Mike
3425 1. LUCENE-906: Elision filter for French.
3426 (Mathieu Lecarme via Otis Gospodnetic)
3428 2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for
3429 not only filtering, but knowing where in a Document a Filter matches
3432 3. LUCENE-868: Added new Term Vector access features. New callback
3433 mechanism allows application to define how and where to read Term
3434 Vectors from disk. This implementation contains several extensions
3435 of the new abstract TermVectorMapper class. The new API should be
3436 back-compatible. No changes in the actual storage of Term Vectors
3438 3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper
3439 to provide information about what document is being accessed.
3440 (Karl Wettin via Grant Ingersoll)
3442 4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for
3443 position based lookup of term vector information.
3444 See item #3 above (LUCENE-868).
3446 5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store)
3447 to verify that locking is working properly. LockVerifyServer runs
3448 a separate server to verify locks. LockStressTest runs a simple
3449 tool that rapidly obtains and releases locks.
3450 VerifyingLockFactory is a LockFactory that wraps any other
3451 LockFactory and consults the LockVerifyServer whenever a lock is
3452 obtained or released, throwing an exception if an illegal lock
3453 obtain occurred. (Patrick Kimber via Mike McCandless)
3455 6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to
3456 support doubles and longs. Added support into SortField for sorting
3457 on doubles and longs as well. (Grant Ingersoll)
3459 7. LUCENE-1020: Created basic index checking & repair tool
3460 (o.a.l.index.CheckIndex). When run without -fix it does a
3461 detailed test of all segments in the index and reports summary
3462 information and any errors it hit. With -fix it will remove
3463 segments that had errors. (Mike McCandless)
3465 8. LUCENE-743: Add IndexReader.reopen() method that re-opens an
3466 existing IndexReader by only loading those portions of an index
3467 that have changed since the reader was (re)opened. reopen() can
3468 be significantly faster than open(), depending on the amount of
3469 index changes. SegmentReader, MultiSegmentReader, MultiReader,
3470 and ParallelReader implement reopen(). (Michael Busch)
3472 9. LUCENE-1040: CharArraySet useful for efficiently checking
3473 set membership of text specified by char[]. (yonik)
3475 10. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a
3476 live backup of an index without pausing indexing. (Mike
3479 11. LUCENE-1019: CustomScoreQuery enhanced to support multiple
3480 ValueSource queries. (Kyle Maxwell via Doron Cohen)
3482 12. LUCENE-1095: Added an option to StopFilter to increase
3483 positionIncrement of the token succeeding a stopped token.
3484 Disabled by default. Similar option added to QueryParser
3485 to consider token positions when creating PhraseQuery
3486 and MultiPhraseQuery. Disabled by default (so by default
3487 the query parser ignores position increments).
3490 13. LUCENE-1380: Added TokenFilter for setting position increment in special cases related to the ShingleFilter (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll)
3496 1. LUCENE-937: CachingTokenFilter now uses an iterator to access the
3497 Tokens that are cached in the LinkedList. This increases performance
3498 significantly, especially when the number of Tokens is large.
3499 (Mark Miller via Michael Busch)
3501 2. LUCENE-843: Substantial optimizations to improve how IndexWriter
3502 uses RAM for buffering documents and to speed up indexing (2X-8X
3503 faster). A single shared hash table now records the in-memory
3504 postings per unique term and is directly flushed into a single
3505 segment. (Mike McCandless)
3507 3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes
3508 takes place when using compound files. (Mike McCandless)
3510 4. LUCENE-959: Remove synchronization in Document (yonik)
3512 5. LUCENE-963: Add setters to Field to allow for re-using a single
3513 Field instance during indexing. This is a sizable performance
3514 gain, especially for small documents. (Mike McCandless)
3516 6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos
3517 and don't rely on exceptions. (Michael Busch)
3519 7. LUCENE-966: Very substantial speedups (~6X faster) for
3520 StandardTokenizer (StandardAnalyzer) by using JFlex instead of
3521 JavaCC to generate the tokenizer.
3522 (Stanislaw Osinski via Mike McCandless)
3524 8. LUCENE-969: Changed core tokenizers & filters to re-use Token and
3525 TokenStream instances when possible to improve tokenization
3526 performance (~10-15%). (Mike McCandless)
3528 9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike
3531 10. LUCENE-986: Refactored SegmentInfos from IndexReader into the new
3532 subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader
3533 now extend DirectoryIndexReader and are the only IndexReader
3534 implementations that use SegmentInfos to access an index and
3535 acquire a write lock for index modifications. (Michael Busch)
3537 11. LUCENE-1007: Allow flushing in IndexWriter to be triggered by
3538 either RAM usage or document count or both (whichever comes
3539 first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable
3540 one of the flush triggers. (Ning Li via Mike McCandless)
3542 12. LUCENE-1043: Speed up merging of stored fields by bulk-copying the
3543 raw bytes for each contiguous range of non-deleted documents.
3544 (Robert Engels via Mike McCandless)
3546 13. LUCENE-693: Speed up nested conjunctions (~2x) that match many
3547 documents, and a slight performance increase for top level
3548 conjunctions. (yonik)
3550 14. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static
3551 and final. (Nathan Beyer via Michael Busch)
3555 1. LUCENE-1051: Generate separate javadocs for core, demo and contrib
3556 classes, as well as an unified view. Also add an appropriate menu
3557 structure to the website. (Michael Busch)
3559 2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery.
3560 (Ronnie Kolehmainen via Michael Busch)
3564 1. LUCENE-908: Improvements and simplifications for how the MANIFEST
3565 file and the META-INF dir are created. (Michael Busch)
3567 2. LUCENE-935: Various improvements for the maven artifacts. Now the
3568 artifacts also include the sources as .jar files. (Michael Busch)
3570 3. Added apply-patch target to top-level build. Defaults to looking for
3571 a patch in ${basedir}/../patches with name specified by -Dpatch.name.
3572 Can also specify any location by -Dpatch.file property on the command
3573 line. This should be helpful for easy application of patches, but it
3574 is also a step towards integrating automatic patch application with
3575 JIRA and Hudson, and is thus subject to change. (Grant Ingersoll)
3577 4. LUCENE-935: Defined property "m2.repository.url" to allow setting
3578 the url to a maven remote repository to deploy to. (Michael Busch)
3580 5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)
3582 6. LUCENE-1055: Remove gdata-server from build files and its sources
3583 from trunk. (Michael Busch)
3585 7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository
3586 via scp and ssh authentication. (Michael Busch)
3588 8. LUCENE-1123: Allow overriding the specification version for
3589 MANIFEST.MF (Michael Busch)
3593 1. LUCENE-766: Test adding two fields with the same name but different
3594 term vector setting. (Nicolas Lalevée via Doron Cohen)
3596 ======================= Release 2.2.0 =======================
3598 Changes in runtime behavior
3602 1. LUCENE-793: created new exceptions and added them to throws clause
3603 for many methods (all subclasses of IOException for backwards
3604 compatibility): index.StaleReaderException,
3605 index.CorruptIndexException, store.LockObtainFailedException.
3606 This was done to better call out the possible root causes of an
3607 IOException from these methods. (Mike McCandless)
3609 2. LUCENE-811: make SegmentInfos class, plus a few methods from related
3610 classes, package-private again (they were unnecessarily made public
3611 as part of LUCENE-701). (Mike McCandless)
3613 3. LUCENE-710: added optional autoCommit boolean to IndexWriter
3614 constructors. When this is false, index changes are not committed
3615 until the writer is closed. This gives explicit control over when
3616 a reader will see the changes. Also added optional custom
3617 deletion policy to explicitly control when prior commits are
3618 removed from the index. This is intended to allow applications to
3619 share an index over NFS by customizing when prior commits are
3620 deleted. (Mike McCandless)
3622 4. LUCENE-818: changed most public methods of IndexWriter,
3623 IndexReader (and its subclasses), FieldsReader and RAMDirectory to
3624 throw AlreadyClosedException if they are accessed after being
3625 closed. (Mike McCandless)
3627 5. LUCENE-834: Changed some access levels for certain Span classes to allow them
3628 to be overridden. They have been marked expert only and not for public
3629 consumption. (Grant Ingersoll)
3631 6. LUCENE-796: Removed calls to super.* from various get*Query methods in
3632 MultiFieldQueryParser, in order to allow sub-classes to override them.
3633 (Steven Parkes via Otis Gospodnetic)
3635 7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
3636 in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
3637 combination when caching is desired.
3638 (Chris Hostetter, Otis Gospodnetic)
3640 8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
3641 to enable extensibility of these classes. (Michael Busch)
3643 9. LUCENE-580: Added the public method reset() to TokenStream. This method does
3644 nothing by default, but may be overwritten by subclasses to support consuming
3645 the TokenStream more than once. (Michael Busch)
3647 10. LUCENE-580: Added a new constructor to Field that takes a TokenStream as
3648 argument, available as tokenStreamValue(). This is useful to avoid the need of
3649 "dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
3651 11. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
3652 getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
3653 getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
3654 improves performance for certain queries but results in scoring out of docid
3655 order. This patch reverse this change, so now by default hit docs are scored
3656 in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
3657 This patch also enables the tests in QueryUtils again that check for docid
3658 order. (Paul Elschot, Doron Cohen, Michael Busch)
3660 12. LUCENE-888: Added Directory.openInput(File path, int bufferSize)
3661 to optionally specify the size of the read buffer. Also added
3662 BufferedIndexInput.setBufferSize(int) to change the buffer size.
3665 13. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need
3666 to be public because it implements the public interface TermPositionVector.
3671 1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen)
3673 2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
3674 Query parser modified to create a prefix query only for the case
3675 that there is a single trailing wildcard (and no additional wildcard
3676 or '?' in the query text). (Doron Cohen)
3678 3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory
3679 and SimpleFSLockFactory. This enables all 4 builtin LockFactory
3680 implementations to be specified via the System property
3681 org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless)
3683 4. LUCENE-821: The new single-norm-file introduced by LUCENE-756
3684 failed to reduce the number of open descriptors since it was still
3685 opened once per field with norms. (yonik)
3687 5. LUCENE-823: Make sure internal file handles are closed when
3688 hitting an exception (eg disk full) while flushing deletes in
3689 IndexWriter's mergeSegments, and also during
3690 IndexWriter.addIndexes. (Mike McCandless)
3692 6. LUCENE-825: If directory is removed after
3693 FSDirectory.getDirectory() but before IndexReader.open you now get
3694 a FileNotFoundException like Lucene pre-2.1 (before this fix you
3695 got an NPE). (Mike McCandless)
3697 7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
3698 because the backslash is the escape character. Also changed the ESCAPED_CHAR
3699 list to contain all possible characters, because every character that
3700 follows a backslash should be considered as escaped. (Michael Busch)
3702 8. LUCENE-372: QueryParser.parse() now ensures that the entire input string
3703 is consumed. Now a ParseException is thrown if a query contains too many
3704 closing parentheses. (Andreas Neumann via Michael Busch)
3706 9. LUCENE-814: javacc build targets now fix line-end-style of generated files.
3707 Now also deleting all javacc generated files before calling javacc.
3708 (Steven Parkes, Doron Cohen)
3710 10. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
3712 11. LUCENE-828: Minor fix for Term's equal().
3713 (Paul Cowan via Otis Gospodnetic)
3715 12. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
3716 and you call addIndexes, and hit an exception (eg disk full) then
3717 when IndexWriter rolls back its internal state this could corrupt
3718 the instance of IndexWriter (but, not the index itself) by
3719 referencing already deleted segments. This bug was only present
3720 in 2.2 (trunk), ie was never released. (Mike McCandless)
3722 13. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
3723 For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
3725 14. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
3726 by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
3727 Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
3728 was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
3729 designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
3731 15. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
3732 has written the postings. Then the resources associated with the
3733 TokenStreams can safely be released. (Michael Busch)
3735 16. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()
3736 won't insert terms twice anymore. (Daniel Naber)
3738 17. LUCENE-881: QueryParser.escape() now also escapes the characters
3739 '|' and '&' which are part of the queryparser syntax. (Michael Busch)
3741 18. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
3742 anymore and ignored, but re-thrown. Some javadoc improvements.
3745 19. LUCENE-698: FilteredQuery now takes the query boost into account for
3746 scoring. (Michael Busch)
3748 20. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
3749 enumeration. (Christian Mallwitz via Daniel Naber)
3751 21. LUCENE-903: FilteredQuery explanation inaccuracy with boost.
3752 Explanation tests now "deep" check the explanation details.
3753 (Chris Hostetter, Doron Cohen)
3755 22. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the
3756 skip target param and ends up at the first match.
3757 (Sudaakeran B. via Chris Hostetter & Doron Cohen)
3759 23. LUCENE-913: Two consecutive score() calls return different
3760 scores for Boolean Queries. (Michael Busch, Doron Cohen)
3762 24. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the
3763 box", again, by moving set/getMaxMergeDocs up from
3764 LogDocMergePolicy into LogMergePolicy. This fixes the API
3765 breakage (non backwards compatible change) caused by LUCENE-994.
3766 (Yonik Seeley via Mike McCandless)
3770 1. LUCENE-759: Added two n-gram-producing TokenFilters.
3773 2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with
3774 RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
3776 3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
3777 These metadata are called Payloads. For every position of a Token one Payload in the form
3778 of a variable length byte array can be stored in the prox file.
3779 Remark: The APIs introduced with this feature are in experimental state and thus
3780 contain appropriate warnings in the javadocs.
3783 4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the
3784 values of a payload (see #3 above.) (Grant Ingersoll)
3786 5. LUCENE-834: Similarity has a new method for scoring payloads called
3787 scorePayloads that can be overridden to take advantage of payload
3788 storage (see #3 above)
3790 6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
3791 implemented it in the appropriate places (Grant Ingersoll)
3793 7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
3794 on the remote side of the RMI connection.
3795 (Matt Ericson via Otis Gospodnetic)
3797 8. LUCENE-446: Added Solr's search.function for scores based on field
3798 values, plus CustomScoreQuery for simple score (post) customization.
3799 (Yonik Seeley, Doron Cohen)
3801 9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more
3802 Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two
3803 Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations
3804 between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples.
3805 (Grant Ingersoll, Michael Busch, Yonik Seeley)
3809 1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
3810 when nextPosition() is called for the first time. This allows using instances
3811 of SegmentTermPositions instead of SegmentTermDocs without additional costs.
3814 2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
3815 IndexOutput directly now. This avoids further buffering and thus avoids
3816 unnecessary array copies. (Michael Busch)
3818 3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
3819 cases and possibly improve scoring performance. Documents can now be
3820 delivered out-of-order as they are scored (e.g. to HitCollector).
3821 N.B. A bit of code had to be disabled in QueryUtils in order for
3822 TestBoolean2 test to keep passing.
3823 (Paul Elschot via Otis Gospodnetic)
3825 4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
3826 them to keep the spell index small. (Daniel Naber)
3828 5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
3829 Together with LUCENE-888 this will allow to adjust the buffer size
3830 dynamically. (Paul Elschot, Michael Busch)
3832 6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
3833 BufferedIndexOutput. Also increase buffer size in
3834 BufferedIndexInput, but only when used during merging. Together,
3835 these increases yield 10-18% overall performance gain vs the
3836 previous 1K defaults. (Mike McCandless)
3838 7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
3839 up most queries that use skipTo(), especially on big indexes with large posting
3840 lists. For average AND queries the speedup is about 20%, for queries that
3841 contain very frequent and very unique terms the speedup can be over 80%.
3846 1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
3847 http://wiki.apache.org/lucene-java/ Updated the links in the docs and
3848 wherever else I found references. (Grant Ingersoll, Joe Schaefer)
3850 2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
3851 consistent with java.util.Comparator.compare(): Any integer is allowed to
3852 be returned instead of only -1/0/1.
3853 (Paul Cowan via Michael Busch)
3855 3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
3856 Solved javadoc errors under jdk5 (jars in path for gdata).
3857 Made "javadocs" target depend on "build-contrib" for first downloading
3858 contrib jars configured for dynamic downloaded. (Note: when running
3859 behind firewall, a firewall prompt might pop up) (Doron Cohen)
3861 4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
3862 remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
3864 5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)
3866 6. LUCENE-926: Added document package javadocs. (Grant Ingersoll)
3870 1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
3871 (Steven Parkes via Michael Busch)
3873 2. LUCENE-885: "ant test" now includes all contrib tests. The new
3874 "ant test-core" target can be used to run only the Core (non
3878 3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages).
3881 4. LUCENE-894: Add custom build file for binary distributions that includes
3882 targets to build the demos. (Chris Hostetter, Michael Busch)
3884 5. LUCENE-904: The "package" targets in build.xml now also generate .md5
3885 checksum files. (Chris Hostetter, Michael Busch)
3887 6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
3888 demo war, demo jar, and the contrib jars. (Michael Busch)
3890 7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)
3892 8. LUCENE-908: Improves content of MANIFEST file and makes it customizable
3893 for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
3894 jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
3895 (Chris Hostetter, Michael Busch)
3897 9. LUCENE-930: Various contrib building improvements to ensure contrib
3898 dependencies are met, and test compilation errors fail the build.
3899 (Steven Parkes, Chris Hostetter)
3901 10. LUCENE-622: Add ant target and pom.xml files for building maven artifacts
3902 of the Lucene core and the contrib modules.
3903 (Sami Siren, Karl Wettin, Michael Busch)
3905 ======================= Release 2.1.0 =======================
3907 Changes in runtime behavior
3909 1. 's' and 't' have been removed from the list of default stopwords
3910 in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
3911 as a stopword meant that 's-class' led to the same results as 'class'.
3912 Note that this problem still exists for 'a', e.g. in 'a-class' as
3913 'a' continues to be a stopword.
3916 2. LUCENE-478: Updated the list of Unicode code point ranges for CJK
3917 (now split into CJ and K) in StandardAnalyzer. (John Wang and
3918 Steven Rowe via Otis Gospodnetic)
3920 3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
3921 and added a few more of them to increase CJK character coverage.
3922 Also documented some of the ranges.
3925 4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
3926 QueryParser. Default is to disallow them, as before.
3927 (Steven Parkes via Otis Gospodnetic)
3929 5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
3930 for range queries. Added useOldRangeQuery property to QueryParser to allow
3931 selection of old RangeQuery class if required.
3934 6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
3935 does not contain a wildcard character (? or *), when previously a
3936 StringIndexOutOfBoundsException was thrown.
3937 (Michael Busch via Erik Hatcher)
3939 7. LUCENE-726: Removed the use of deprecated doc.fields() method and
3941 (Michael Busch via Otis Gospodnetic)
3943 8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
3944 and added a call to enumerators.remove() in TermInfosReader.close().
3945 The finalize() overrides were added to help with a pre-1.4.2 JVM bug
3946 that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
3949 9. LUCENE-771: The default location of the write lock is now the
3950 index directory, and is named simply "write.lock" (without a big
3951 digest prefix). The system properties "org.apache.lucene.lockDir"
3952 nor "java.io.tmpdir" are no longer used as the global directory
3953 for storing lock files, and the LOCK_DIR field of FSDirectory is
3954 now deprecated. (Mike McCandless)
3958 1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
3959 (Samphan Raruenrom via Chris Hostetter)
3961 2. LUCENE-545: New FieldSelector API and associated changes to
3962 IndexReader and implementations. New Fieldable interface for use
3963 with the lazy field loading mechanism. (Grant Ingersoll and Chuck
3964 Williams via Grant Ingersoll)
3966 3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura
3967 Smolsky, Yonik Seeley)
3969 4. LUCENE-678: Added NativeFSLockFactory, which implements locking
3970 using OS native locking (via java.nio.*). (Michael McCandless via
3973 5. LUCENE-544: Added the ability to specify different boosts for
3974 different fields when using MultiFieldQueryParser (Matt Ericson
3975 via Otis Gospodnetic)
3977 6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
3978 optimize the index when adding new segments, only performing
3979 merges as needed. (Ning Li via Yonik Seeley)
3981 7. LUCENE-573: QueryParser now allows backslash escaping in
3982 quoted terms and phrases. (Michael Busch via Yonik Seeley)
3984 8. LUCENE-716: QueryParser now allows specification of Unicode
3985 characters in terms via a unicode escape of the form \uXXXX
3986 (Michael Busch via Yonik Seeley)
3988 9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
3989 and IndexWriter.flushRamSegments(), allowing applications to
3990 control the amount of memory used to buffer documents.
3991 (Chuck Williams via Yonik Seeley)
3993 10. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery
3996 11. LUCENE-741: Command-line utility for modifying or removing norms
3997 on fields in an existing index. This is mostly based on LUCENE-496
3998 and lives in contrib/miscellaneous.
3999 (Chris Hostetter, Otis Gospodnetic)
4001 12. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
4002 their passing unit tests.
4005 13. LUCENE-565: Added methods to IndexWriter to more efficiently
4006 handle updating documents (the "delete then add" use case). This
4007 is intended to be an eventual replacement for the existing
4008 IndexModifier. Added IndexWriter.flush() (renamed from
4009 flushRamSegments()) to flush all pending updates (held in RAM), to
4010 the Directory. (Ning Li via Mike McCandless)
4012 14. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
4013 which allow one to retrieve the size of a field without retrieving the
4014 actual field. (Chuck Williams via Grant Ingersoll)
4016 15. LUCENE-799: Properly handle lazy, compressed fields.
4017 (Mike Klaas via Grant Ingersoll)
4021 1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow
4022 changing of termText via setTermText(). (Yonik Seeley)
4024 2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
4025 and is supposed to be replaced with the WordlistLoader class in
4026 package org.apache.lucene.analysis (Daniel Naber)
4028 3. LUCENE-609: Revert return type of Document.getField(s) to Field
4029 for backward compatibility, added new Document.getFieldable(s)
4030 for access to new lazy loaded fields. (Yonik Seeley)
4032 4. LUCENE-608: Document.fields() has been deprecated and a new method
4033 Document.getFields() has been added that returns a List instead of
4034 an Enumeration (Daniel Naber)
4036 5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
4037 subclass allows explain methods to produce Explanations which model
4038 "matching" independent of having a positive value.
4041 6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
4042 and IndexWriter.setDefaultCommitLockTimeout for overriding default
4043 timeout values for all future instances of IndexWriter (as well
4044 as for any other classes that may reference the static values,
4046 (Michael McCandless via Chris Hostetter)
4048 7. LUCENE-638: FSDirectory.list() now only returns the directory's
4049 Lucene-related files. Thanks to this change one can now construct
4050 a RAMDirectory from a file system directory that contains files
4051 not related to Lucene.
4052 (Simon Willnauer via Daniel Naber)
4054 8. LUCENE-635: Decoupling locking implementation from Directory
4055 implementation. Added set/getLockFactory to Directory and moved
4056 all locking code into subclasses of abstract class LockFactory.
4057 FSDirectory and RAMDirectory still default to their prior locking
4058 implementations, but now you can mix & match, for example using
4059 SingleInstanceLockFactory (ie, in memory locking) locking with an
4060 FSDirectory. Note that now you must call setDisableLocks before
4061 the instantiation a FSDirectory if you wish to disable locking
4063 (Michael McCandless, Jeff Patterson via Yonik Seeley)
4065 9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
4066 (Steven Parkes via Otis Gospodnetic)
4068 10. LUCENE-701: Lockless commits: a commit lock is no longer required
4069 when a writer commits and a reader opens the index. This includes
4070 a change to the index file format (see docs/fileformats.html for
4071 details). It also removes all APIs associated with the commit
4072 lock & its timeout. Readers are now truly read-only and do not
4073 block one another on startup. This is the first step to getting
4074 Lucene to work correctly over NFS (second step is
4075 LUCENE-710). (Mike McCandless)
4077 11. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
4078 in Similarity's MoreLikeThis class. The misspelling has been
4079 replaced by the correct spelling.
4080 (Andi Vajda via Daniel Naber)
4082 12. LUCENE-738: Reduce the size of the file that keeps track of which
4083 documents are deleted when the number of deleted documents is
4084 small. This changes the index file format and cannot be
4085 read by previous versions of Lucene. (Doron Cohen via Yonik Seeley)
4087 13. LUCENE-756: Maintain all norms in a single .nrm file to reduce the
4088 number of open files and file descriptors for the non-compound index
4089 format. This changes the index file format, but maintains the
4090 ability to read and update older indices. The first segment merge
4091 on an older format index will create a single .nrm file for the new
4092 segment. (Doron Cohen via Yonik Seeley)
4094 14. LUCENE-732: DateTools support has been added to QueryParser, with
4095 setters for both the default Resolution, and per-field Resolution.
4096 For backwards compatibility, DateField is still used if no Resolutions
4097 are specified. (Michael Busch via Chris Hostetter)
4099 15. Added isOptimized() method to IndexReader.
4102 16. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
4103 take a boolean "create" argument. Instead you should use
4104 IndexWriter's "create" argument to create a new index.
4107 17. LUCENE-780: Add a static Directory.copy() method to copy files
4108 from one Directory to another. (Jiri Kuhn via Mike McCandless)
4110 18. LUCENE-773: Added Directory.clearLock(String name) to forcefully
4111 remove an old lock. The default implementation is to ask the
4112 lockFactory (if non null) to clear the lock. (Mike McCandless)
4114 19. LUCENE-795: Directory.renameFile() has been deprecated as it is
4115 not used anymore inside Lucene. (Daniel Naber)
4119 1. Fixed the web application demo (built with "ant war-demo") which
4120 didn't work because it used a QueryParser method that had
4121 been removed (Daniel Naber)
4123 2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement
4126 3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar
4127 (Karl Wettin via Yonik Seeley)
4129 4. LUCENE-587: Explanation.toHtml was producing malformed HTML
4132 5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
4134 6. LUCENE-601: RAMDirectory and RAMFile made Serializable
4135 (Karl Wettin via Otis Gospodnetic)
4137 7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
4138 Explanations match up with the real scores.
4141 8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to
4142 new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
4144 9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
4145 disambiguate inner class scorer's use of doc() in BooleanScorer2,
4146 other test code changes. (DM Smith via Yonik Seeley)
4148 10. LUCENE-451: All core query types now use ComplexExplanations so that
4149 boosts of zero don't confuse the BooleanWeight explain method.
4152 11. LUCENE-593: Fixed LuceneDictionary's inner Iterator
4153 (KĂĄre Fiedler Christiansen via Otis Gospodnetic)
4155 12. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
4158 13. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
4159 to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
4161 14. LUCENE-650: Fixed NPE in Locale specific String Sort when Document
4163 (Oliver Hutchison via Chris Hostetter)
4165 15. LUCENE-683: Fixed data corruption when reading lazy loaded fields.
4168 16. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
4169 lock to be shared between different directories.
4170 (Michael McCandless via Yonik Seeley)
4172 17. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.
4175 18. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
4176 called on it before next(). (Yonik Seeley)
4178 19. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
4179 to recognize ordered spans if they overlapped with unordered spans.
4180 (Paul Elschot via Chris Hostetter)
4182 20. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
4183 in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
4185 21. LUCENE-715: Fixed private constructor in IndexWriter.java to
4186 properly release the acquired write lock if there is an
4187 IOException after acquiring the write lock but before finishing
4188 instantiation. (Matthew Bogosian via Mike McCandless)
4190 22. LUCENE-651: Multiple different threads requesting the same
4191 FieldCache entry (often for Sorting by a field) at the same
4192 time caused multiple generations of that entry, which was
4193 detrimental to performance and memory use.
4194 (Oliver Hutchison via Otis Gospodnetic)
4196 23. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.
4197 (Doron Cohen via Otis Gospodnetic)
4199 24. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
4200 classes from contrib/similarity, as their new home is under
4204 25. LUCENE-669: Do not double-close the RandomAccessFile in
4205 FSIndexInput/Output during finalize(). Besides sending an
4206 IOException up to the GC, this may also be the cause intermittent
4207 "The handle is invalid" IOExceptions on Windows when trying to
4208 close readers or writers. (Michael Busch via Mike McCandless)
4210 26. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
4211 on any exceptions (eg disk full). The semantics of these methods
4212 is now transactional: either all indices are merged or none are.
4213 Also fixed IndexWriter.mergeSegments (called outside of
4214 addIndexes(*) by addDocument, optimize, flushRamSegments) and
4215 IndexReader.commit() (called by close) to clean up and keep the
4216 instance state consistent to what's actually in the index (Mike
4219 27. LUCENE-129: Change finalizers to do "try {...} finally
4220 {super.finalize();}" to make sure we don't miss finalizers in
4221 classes above us. (Esmond Pitt via Mike McCandless)
4223 28. LUCENE-754: Fix a problem introduced by LUCENE-651, causing
4224 IndexReaders to hang around forever, in addition to not
4225 fixing the original FieldCache performance problem.
4226 (Chris Hostetter, Yonik Seeley)
4228 29. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
4229 correctly raise ArrayIndexOutOfBoundsException when docNum is too
4230 large. Previously, if docNum was only slightly too large (within
4231 the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
4232 exception would be raised and instead the index would become
4233 silently corrupted. The corruption then only appears much later,
4234 in mergeSegments, when the corrupted segment is merged with
4235 segment(s) after it. (Mike McCandless)
4237 30. LUCENE-768: Fix case where an Exception during deleteDocument,
4238 undeleteAll or setNorm in IndexReader could leave the reader in a
4239 state where close() fails to release the write lock.
4242 31. Remove "tvp" from known index file extensions because it is
4243 never used. (Nicolas Lalevée via Bernhard Messer)
4245 32. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
4246 rely on file length check and instead use the SegmentInfo's
4247 docCount that's already stored explicitly in the index. This is a
4248 defensive bug fix (ie, there is no known problem seen "in real
4249 life" due to this, just a possible future problem). (Chuck
4250 Williams via Mike McCandless)
4254 1. LUCENE-586: TermDocs.skipTo() is now more efficient for
4255 multi-segment indexes. This will improve the performance of many
4256 types of queries against a non-optimized index. (Andrew Hudson
4259 2. LUCENE-623: RAMDirectory.close now nulls out its reference to all
4260 internal "files", allowing them to be GCed even if references to the
4261 RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
4263 3. LUCENE-629: Compressed fields are no longer uncompressed and
4264 recompressed during segment merges (e.g. during indexing or
4265 optimizing), thus improving performance . (Michael Busch via Otis
4268 4. LUCENE-388: Improve indexing performance when maxBufferedDocs is
4269 large by keeping a count of buffered documents rather than
4270 counting after each document addition. (Doron Cohen, Paul Smith,
4273 5. Modified TermScorer.explain to use TermDocs.skipTo() instead of
4274 looping through docs. (Grant Ingersoll)
4276 6. LUCENE-672: New indexing segment merge policy flushes all
4277 buffered docs to their own segment and delays a merge until
4278 mergeFactor segments of a certain level have been accumulated.
4279 This increases indexing performance in the presence of deleted
4280 docs or partially full segments as well as enabling future
4283 NOTE: this also fixes an "under-merging" bug whereby it is
4284 possible to get far too many segments in your index (which will
4285 drastically slow down search, risks exhausting file descriptor
4286 limit, etc.). This can happen when the number of buffered docs
4287 at close, plus the number of docs in the last non-ram segment is
4288 greater than mergeFactor. (Ning Li, Yonik Seeley)
4290 7. Lazy loaded fields unnecessarily retained an extra copy of loaded
4291 String data. (Yonik Seeley)
4293 8. LUCENE-443: ConjunctionScorer performance increase. Speed up
4294 any BooleanQuery with more than one mandatory clause.
4295 (Abdul Chaudhry, Paul Elschot via Yonik Seeley)
4297 9. LUCENE-365: DisjunctionSumScorer performance increase of
4298 ~30%. Speeds up queries with optional clauses. (Paul Elschot via
4301 10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
4302 size buffers, which will speed up merging and retrieving binary
4303 and compressed fields. (Nadav Har'El via Yonik Seeley)
4305 11. LUCENE-687: Lazy skipping on proximity file speeds up most
4306 queries involving term positions, including phrase queries.
4307 (Michael Busch via Yonik Seeley)
4309 12. LUCENE-714: Replaced 2 cases of manual for-loop array copying
4310 with calls to System.arraycopy instead, in DocumentWriter.java.
4311 (Nicolas Lalevee via Mike McCandless)
4313 13. LUCENE-729: Non-recursive skipTo and next implementation of
4314 TermDocs for a MultiReader. The old implementation could
4315 recurse up to the number of segments in the index. (Yonik Seeley)
4317 14. LUCENE-739: Improve segment merging performance by reusing
4318 the norm array across different fields and doing bulk writes
4319 of norms of segments with no deleted docs.
4320 (Michael Busch via Yonik Seeley)
4322 15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
4323 to the List of clauses and replaced the internal synchronized Vector
4324 with an unsynchronized List. (Yonik Seeley)
4326 16. LUCENE-750: Remove finalizers from FSIndexOutput and move the
4327 FSIndexInput finalizer to the actual file so all clones don't
4328 register a new finalizer. (Yonik Seeley)
4332 1. Added TestTermScorer.java (Grant Ingersoll)
4334 2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)
4336 3. LUCENE-744 Append the user.name property onto the temporary directory
4337 that is created so it doesn't interfere with other users. (Grant Ingersoll)
4341 1. Added style sheet to xdocs named lucene.css and included in the
4342 Anakia VSL descriptor. (Grant Ingersoll)
4344 2. Added scoring.xml document into xdocs. Updated Similarity.java
4345 scoring formula.(Grant Ingersoll and Steve Rowe. Updates from:
4346 Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
4349 3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
4351 4. Moved xdocs directory to src/site/src/documentation/content/xdocs per
4352 Issue 707. Site now builds using Forrest, just like the other Lucene
4353 siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
4354 for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
4355 Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
4357 5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll)
4359 6. LUCENE-713 Updated the Term Vector section of File Formats to include
4360 documentation on how Offset and Position info are stored in the TVF file.
4361 (Grant Ingersoll, Samir Abdou)
4363 7. Added in link to Clover Test Code Coverage Reports under the Develop
4364 section in Resources (Grant Ingersoll)
4366 8. LUCENE-748: Added details for semantics of IndexWriter.close on
4367 hitting an Exception. (Jed Wesley-Smith via Mike McCandless)
4369 9. Added some text about what is contained in releases.
4370 (Eric Haszlakiewicz via Grant Ingersoll)
4372 10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
4373 makes a full copy of the starting Directory. (Mike McCandless)
4375 11. LUCENE-764: Fix javadocs to detail temporary space requirements
4376 for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
4377 methods. (Mike McCandless)
4381 1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721
4382 To enable clover code coverage, you must have clover.jar in the ANT
4383 classpath and specify -Drun.clover=true on the command line.
4384 (Michael Busch and Grant Ingersoll)
4386 2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
4387 ${build.dir}/test just like the tempDir sysproperty.
4389 3. LUCENE-757 Added new target named init-dist that does setup for
4390 distribution of both binary and source distributions. Called by package
4393 ======================= Release 2.0.0 =======================
4397 1. All deprecated methods and fields have been removed, except
4398 DateField, which will still be supported for some time
4399 so Lucene can read its date fields from old indexes
4400 (Yonik Seeley & Grant Ingersoll)
4402 2. DisjunctionSumScorer is no longer public.
4403 (Paul Elschot via Otis Gospodnetic)
4405 3. Creating a Field with both an empty name and an empty value
4406 now throws an IllegalArgumentException
4409 4. LUCENE-301: Added new IndexWriter({String,File,Directory},
4410 Analyzer) constructors that do not take a boolean "create"
4411 argument. These new constructors will create a new index if
4412 necessary, else append to the existing one. (Dan Armbrust via
4417 1. LUCENE-496: Command line tool for modifying the field norms of an
4418 existing index; added to contrib/miscellaneous. (Chris Hostetter)
4420 2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.
4425 1. LUCENE-330: Fix issue of FilteredQuery not working properly within
4426 BooleanQuery. (Paul Elschot via Erik Hatcher)
4428 2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
4429 with RemoteSearchable. (Philippe Laflamme via Yonik Seeley)
4431 3. Added methods to get/set writeLockTimeout and commitLockTimeout in
4432 IndexWriter. These could be set in Lucene 1.4 using a system property.
4433 This feature had been removed without adding the corresponding
4434 getter/setter methods. (Daniel Naber)
4436 4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
4437 when using SpanQueries. (Paul Elschot via Yonik Seeley)
4439 5. Implemented FilterIndexReader.getVersion() and isCurrent()
4442 6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
4443 that sometimes caused the index order of documents to change.
4446 7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
4447 subsequent String sorts with different locales to sort identically.
4448 (Paul Cowan via Yonik Seeley)
4450 8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery
4451 (Stefan Will via Yonik Seeley)
4453 9. LUCENE-514: Added getTermArrays() and extractTerms() to
4454 MultiPhraseQuery (Eric Jain & Yonik Seeley)
4456 10. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors
4457 (frederic via Yonik)
4459 11. LUCENE-352: Fixed bug in SpanNotQuery that manifested as
4460 NullPointerException when "exclude" query was not a SpanTermQuery.
4463 12. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
4466 13. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
4467 didn't know about the field yet, reader didn't keep track if it had deletions,
4468 and deleteDocument calls could circumvent synchronization on the subreaders.
4469 (Chuck Williams via Yonik Seeley)
4471 14. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
4472 ConstantScoreQuery in order to allow their use with a MultiSearcher.
4475 15. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.
4476 (Peter Royal, Michael Chan, Yonik Seeley)
4478 16. LUCENE-485: Don't hold commit lock while removing obsolete index
4479 files. (Luc Vanlerberghe via cutting)
4486 1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization
4487 introduced in 1.9-final. (Shay Banon & Steven Tamm via cutting)
4491 Note that this release is mostly but not 100% source compatible with
4492 the previous release of Lucene (1.4.3). In other words, you should
4493 make sure your application compiles with this version of Lucene before
4494 you replace the old Lucene JAR with the new one. Many methods have
4495 been deprecated in anticipation of release 2.0, so deprecation
4496 warnings are to be expected when upgrading from 1.4.3 to 1.9.
4500 1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
4501 effects on indexing performance and has thus been reverted. The
4502 argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
4503 an exception is thrown. (Daniel Naber)
4507 1. Optimized BufferedIndexOutput.writeBytes() to use
4508 System.arraycopy() in more cases, rather than copying byte-by-byte.
4509 (Lukas Zapletal via Cutting)
4515 1. To compile and use Lucene you now need Java 1.4 or later.
4517 Changes in runtime behavior
4519 1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
4520 FuzzyQuery expands to more than BooleanQuery.maxClauseCount
4521 terms only the BooleanQuery.maxClauseCount most similar terms
4522 go into the rewritten query and thus the exception is avoided.
4525 2. Changed system property from "org.apache.lucene.lockdir" to
4526 "org.apache.lucene.lockDir", so that its casing follows the existing
4527 pattern used in other Lucene system properties. (Bernhard)
4529 3. The terms of RangeQueries and FuzzyQueries are now converted to
4530 lowercase by default (as it has been the case for PrefixQueries
4531 and WildcardQueries before). Use setLowercaseExpandedTerms(false)
4532 to disable that behavior but note that this also affects
4533 PrefixQueries and WildcardQueries. (Daniel Naber)
4535 4. Document frequency that is computed when MultiSearcher is used is now
4536 computed correctly and "globally" across subsearchers and indices, while
4537 before it used to be computed locally to each index, which caused
4538 ranking across multiple indices not to be equivalent.
4539 (Chuck Williams, Wolf Siberski via Otis, bug #31841)
4541 5. When opening an IndexWriter with create=true, Lucene now only deletes
4542 its own files from the index directory (looking at the file name suffixes
4543 to decide if a file belongs to Lucene). The old behavior was to delete
4544 all files. (Daniel Naber and Bernhard Messer, bug #34695)
4546 6. The version of an IndexReader, as returned by getCurrentVersion()
4547 and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
4548 is now initialized by the system time in milliseconds.
4549 (Bernhard Messer via Daniel Naber)
4551 7. Several default values cannot be set via system properties anymore, as
4552 this has been considered inappropriate for a library like Lucene. For
4553 most properties there are set/get methods available in IndexWriter which
4554 you should use instead. This affects the following properties:
4555 See IndexWriter for getter/setter methods:
4556 org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
4557 org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
4558 org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
4559 org.apache.lucene.mergeFactor,
4560 See BooleanQuery for getter/setter methods:
4561 org.apache.lucene.maxClauseCount
4562 See FSDirectory for getter/setter methods:
4566 8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
4567 instead of using Integer and Float classes for parsing.
4568 (Yonik Seeley via Otis Gospodnetic)
4570 9. Expert level search routines returning TopDocs and TopFieldDocs
4571 no longer normalize scores. This also fixes bugs related to
4572 MultiSearchers and score sorting/normalization.
4573 (Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
4577 1. Added support for stored compressed fields (patch #31149)
4578 (Bernhard Messer via Christoph)
4580 2. Added support for binary stored fields (patch #29370)
4581 (Drew Farris and Bernhard Messer via Christoph)
4583 3. Added support for position and offset information in term vectors
4584 (patch #18927). (Grant Ingersoll & Christoph)
4586 4. A new class DateTools has been added. It allows you to format dates
4587 in a readable format adequate for indexing. Unlike the existing
4588 DateField class DateTools can cope with dates before 1970 and it
4589 forces you to specify the desired date resolution (e.g. month, day,
4590 second, ...) which can make RangeQuerys on those fields more efficient.
4593 5. QueryParser now correctly works with Analyzers that can return more
4594 than one token per position. For example, a query "+fast +car"
4595 would be parsed as "+fast +(car automobile)" if the Analyzer
4596 returns "car" and "automobile" at the same position whenever it
4597 finds "car" (Patch #23307).
4598 (Pierrick Brihaye, Daniel Naber)
4600 6. Permit unbuffered Directory implementations (e.g., using mmap).
4601 InputStream is replaced by the new classes IndexInput and
4602 BufferedIndexInput. OutputStream is replaced by the new classes
4603 IndexOutput and BufferedIndexOutput. InputStream and OutputStream
4604 are now deprecated and FSDirectory is now subclassable. (cutting)
4606 7. Add native Directory and TermDocs implementations that work under
4607 GCJ. These require GCC 3.4.0 or later and have only been tested
4608 on Linux. Use 'ant gcj' to build demo applications. (cutting)
4610 8. Add MMapDirectory, which uses nio to mmap input files. This is
4611 still somewhat slower than FSDirectory. However it uses less
4612 memory per query term, since a new buffer is not allocated per
4613 term, which may help applications which use, e.g., wildcard
4614 queries. It may also someday be faster. (cutting & Paul Elschot)
4616 9. Added javadocs-internal to build.xml - bug #30360
4617 (Paul Elschot via Otis)
4619 10. Added RangeFilter, a more generically useful filter than DateFilter.
4620 (Chris M Hostetter via Erik)
4622 11. Added NumberTools, a utility class indexing numeric fields.
4623 (adapted from code contributed by Matt Quail; committed by Erik)
4625 12. Added public static IndexReader.main(String[] args) method.
4626 IndexReader can now be used directly at command line level
4627 to list and optionally extract the individual files from an existing
4628 compound index file.
4629 (adapted from code contributed by Garrett Rooney; committed by Bernhard)
4631 13. Add IndexWriter.setTermIndexInterval() method. See javadocs.
4634 14. Added LucenePackage, whose static get() method returns java.util.Package,
4635 which lets the caller get the Lucene version information specified in
4637 (Doug Cutting via Otis)
4639 15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
4640 This provides standard java.util.Iterator iteration over Hits.
4641 Each call to the iterator's next() method returns a Hit object.
4642 (Jeremy Rayner via Erik)
4644 16. Add ParallelReader, an IndexReader that combines separate indexes
4645 over different fields into a single virtual index. (Doug Cutting)
4647 17. Add IntParser and FloatParser interfaces to FieldCache, so that
4648 fields in arbitrarily formats can be cached as ints and floats.
4651 18. Added class org.apache.lucene.index.IndexModifier which combines
4652 IndexWriter and IndexReader, so you can add and delete documents without
4653 worrying about synchronization/locking issues.
4656 19. Lucene can now be used inside an unsigned applet, as Lucene's access
4657 to system properties will not cause a SecurityException anymore.
4658 (Jon Schuster via Daniel Naber, bug #34359)
4660 20. Added a new class MatchAllDocsQuery that matches all documents.
4661 (John Wang via Daniel Naber, bug #34946)
4663 21. Added ability to omit norms on a per field basis to decrease
4664 index size and memory consumption when there are many indexed fields.
4665 See Field.setOmitNorms()
4666 (Yonik Seeley, LUCENE-448)
4668 22. Added NullFragmenter to contrib/highlighter, which is useful for
4669 highlighting entire documents or fields.
4672 23. Added regular expression queries, RegexQuery and SpanRegexQuery.
4673 Note the same term enumeration caveats apply with these queries as
4674 apply to WildcardQuery and other term expanding queries.
4675 These two new queries are not currently supported via QueryParser.
4678 24. Added ConstantScoreQuery which wraps a filter and produces a score
4679 equal to the query boost for every matching document.
4680 (Yonik Seeley, LUCENE-383)
4682 25. Added ConstantScoreRangeQuery which produces a constant score for
4683 every document in the range. One advantage over a normal RangeQuery
4684 is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
4685 number of terms the range can cover. Both endpoints may also be open.
4686 (Yonik Seeley, LUCENE-383)
4688 26. Added ability to specify a minimum number of optional clauses that
4689 must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch().
4690 (Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
4692 27. Added DisjunctionMaxQuery which provides the maximum score across its clauses.
4693 It's very useful for searching across multiple fields.
4694 (Chuck Williams via Yonik Seeley, LUCENE-323)
4696 28. New class ISOLatin1AccentFilter that replaces accented characters in the ISO
4697 Latin 1 character set by their unaccented equivalent.
4698 (Sven Duzont via Erik Hatcher)
4700 29. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
4701 This is useful for data like zip codes, ids, and some product names.
4704 30. Copied LengthFilter from contrib area to core. Removes words that are too
4705 long and too short from the stream.
4706 (David Spencer via Otis and Daniel)
4708 31. Added getPositionIncrementGap(String fieldName) to Analyzer. This allows
4709 custom analyzers to put gaps between Field instances with the same field
4710 name, preventing phrase or span queries crossing these boundaries. The
4711 default implementation issues a gap of 0, allowing the default token
4712 position increment of 1 to put the next field's first token into a
4713 successive position.
4714 (Erik Hatcher, with advice from Yonik)
4716 32. StopFilter can now ignore case when checking for stop words.
4717 (Grant Ingersoll via Yonik, LUCENE-248)
4719 33. Add TopDocCollector and TopFieldDocCollector. These simplify the
4720 implementation of hit collectors that collect only the
4721 top-scoring or top-sorting hits.
4725 1. Several methods and fields have been deprecated. The API documentation
4726 contains information about the recommended replacements. It is planned
4727 that most of the deprecated methods and fields will be removed in
4728 Lucene 2.0. (Daniel Naber)
4730 2. The Russian and the German analyzers have been moved to contrib/analyzers.
4731 Also, the WordlistLoader class has been moved one level up in the
4732 hierarchy and is now org.apache.lucene.analysis.WordlistLoader
4735 3. The API contained methods that declared to throw an IOException
4736 but that never did this. These declarations have been removed. If
4737 your code tries to catch these exceptions you might need to remove
4738 those catch clauses to avoid compile errors. (Daniel Naber)
4740 4. Add a serializable Parameter Class to standardize parameter enum
4741 classes in BooleanClause and Field. (Christoph)
4743 5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
4744 This allows custom SpanQuery subclasses that rewrite (for term expansion, for
4745 example) to nest within the built-in SpanQuery classes successfully.
4749 1. The JSP demo page (src/jsp/results.jsp) now properly closes the
4750 IndexSearcher it opens. (Daniel Naber)
4752 2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
4753 prevented deletion of obsolete segments. (Christoph Goller)
4755 3. Fix in FieldInfos to avoid the return of an extra blank field in
4756 IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
4758 4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
4759 PhrasePrefixQuery) could provoke UnsupportedOperationException
4760 (bug #33161). (Rhett Sutphin via Daniel Naber)
4762 5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
4763 if skipTo() was called without prior call to next() fixed. (Christoph)
4765 6. Disable Similiarty.coord() in the scoring of most automatically
4766 generated boolean queries. The coord() score factor is
4767 appropriate when clauses are independently specified by a user,
4768 but is usually not appropriate when clauses are generated
4769 automatically, e.g., by a fuzzy, wildcard or range query. Matches
4770 on such automatically generated queries are no longer penalized
4771 for not matching all terms. (Doug Cutting, Patch #33472)
4773 7. Getting a lock file with Lock.obtain(long) was supposed to wait for
4774 a given amount of milliseconds, but this didn't work.
4775 (John Wang via Daniel Naber, Bug #33799)
4777 8. Fix FSDirectory.createOutput() to always create new files.
4778 Previously, existing files were overwritten, and an index could be
4779 corrupted when the old version of a file was longer than the new.
4780 Now any existing file is first removed. (Doug Cutting)
4782 9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
4783 could return an incorrect number of hits.
4784 (Reece Wilton via Erik Hatcher, Bug #35157)
4786 10. Fix NullPointerException that could occur with a MultiPhraseQuery
4787 inside a BooleanQuery.
4788 (Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)
4790 11. Fixed SnowballFilter to pass through the position increment from
4792 (Yonik Seeley via Erik Hatcher, LUCENE-437)
4794 12. Added Unicode range of Korean characters to StandardTokenizer,
4795 grouping contiguous characters into a token rather than one token
4796 per character. This change also changes the token type to "<CJ>"
4797 for Chinese and Japanese character tokens (previously it was "<CJK>").
4798 (Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
4800 13. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
4801 FieldInfo.storePositionWithTermVector and creates the Field with
4802 correct TermVector parameter.
4803 (Frank Steinmann via Bernhard, LUCENE-455)
4805 14. Fixed WildcardQuery to prevent "cat" matching "ca??".
4806 (Xiaozheng Ma via Bernhard, LUCENE-306)
4808 15. Fixed a bug where MultiSearcher and ParallelMultiSearcher could
4809 change the sort order when sorting by string for documents without
4810 a value for the sort field.
4811 (Luc Vanlerberghe via Yonik, LUCENE-453)
4813 16. Fixed a sorting problem with MultiSearchers that can lead to
4814 missing or duplicate docs due to equal docs sorting in an arbitrary order.
4815 (Yonik Seeley, LUCENE-456)
4817 17. A single hit using the expert level sorted search methods
4818 resulted in the score not being normalized.
4819 (Yonik Seeley, LUCENE-462)
4821 18. Fixed inefficient memory usage when loading an index into RAMDirectory.
4822 (Volodymyr Bychkoviak via Bernhard, LUCENE-475)
4824 19. Corrected term offsets returned by ChineseTokenizer.
4825 (Ray Tsang via Erik Hatcher, LUCENE-324)
4827 20. Fixed MultiReader.undeleteAll() to correctly update numDocs.
4828 (Robert Kirchgessner via Doug Cutting, LUCENE-479)
4830 21. Race condition in IndexReader.getCurrentVersion() and isCurrent()
4831 fixed by acquiring the commit lock.
4832 (Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
4834 22. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
4835 this has now been fixed. (Daniel Naber)
4837 23. Fixed QueryParser when called with a date in local form like
4838 "[1/16/2000 TO 1/18/2000]". This query did not include the documents
4839 of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
4841 24. Removed sorting constraint that threw an exception if there were
4842 not yet any values for the sort field (Yonik Seeley, LUCENE-374)
4846 1. Disk usage (peak requirements during indexing and optimization)
4847 in case of compound file format has been improved.
4848 (Bernhard, Dmitry, and Christoph)
4850 2. Optimize the performance of certain uses of BooleanScorer,
4851 TermScorer and IndexSearcher. In particular, a BooleanQuery
4852 composed of TermQuery, with not all terms required, that returns a
4853 TopDocs (e.g., through a Hits with no Sort specified) runs much
4856 3. Removed synchronization from reading of term vectors with an
4857 IndexReader (Patch #30736). (Bernhard Messer via Christoph)
4859 4. Optimize term-dictionary lookup to allocate far fewer terms when
4860 scanning for the matching term. This speeds searches involving
4861 low-frequency terms, where the cost of dictionary lookup can be
4862 significant. (cutting)
4864 5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
4865 of 0 now run 20-50% faster (Patch #31882).
4866 (Jonathan Hager via Daniel Naber)
4868 6. A Version of BooleanScorer (BooleanScorer2) added that delivers
4869 documents in increasing order and implements skipTo. For queries
4870 with required or forbidden clauses it may be faster than the old
4871 BooleanScorer, for BooleanQueries consisting only of optional
4872 clauses it is probably slower. The new BooleanScorer is now the
4873 default. (Patch 31785 by Paul Elschot via Christoph)
4875 7. Use uncached access to norms when merging to reduce RAM usage.
4876 (Bug #32847). (Doug Cutting)
4878 8. Don't read term index when random-access is not required. This
4879 reduces time to open IndexReaders and they use less memory when
4880 random access is not required, e.g., when merging segments. The
4881 term index is now read into memory lazily at the first
4882 random-access. (Doug Cutting)
4884 9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
4885 added indexes is larger than mergeFactor. Previously this could
4886 result in quadratic performance. Now performance is n log(n).
4889 10. Speed up the creation of TermEnum for indices with multiple
4890 segments and deleted documents, and thus speed up PrefixQuery,
4891 RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
4892 and sorting the first time on a field.
4893 (Yonik Seeley, LUCENE-454)
4895 11. Optimized and generalized 32 bit floating point to byte
4896 (custom 8 bit floating point) conversions. Increased the speed of
4897 Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
4898 (Yonik Seeley, LUCENE-467)
4902 1. Lucene's source code repository has converted from CVS to
4903 Subversion. The new repository is at
4904 http://svn.apache.org/repos/asf/lucene/java/trunk
4906 2. Lucene's issue tracker has migrated from Bugzilla to JIRA.
4907 Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
4908 The old issues are still available at
4909 http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx
4910 (use the bug number instead of xxxx)
4915 1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
4916 messages which might contain user input (e.g. error messages about
4917 query parsing). If you used that page as a starting point for your
4918 own code please make sure your code also properly escapes HTML
4919 characters from user input in order to avoid so-called cross site
4920 scripting attacks. (Daniel Naber)
4922 2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
4923 API is supported again. (Christoph)
4928 1. Fixed bug #31241: Sorting could lead to incorrect results (documents
4929 missing, others duplicated) if the sort keys were not unique and there
4930 were more than 100 matches. (Daniel Naber)
4932 2. Memory leak in Sort code (bug #31240) eliminated.
4933 (Rafal Krzewski via Christoph and Daniel)
4935 3. FuzzyQuery now takes an additional parameter that specifies the
4936 minimum similarity that is required for a term to match the query.
4937 The QueryParser syntax for this is term~x, where x is a floating
4938 point number >= 0 and < 1 (a bigger number means that a higher
4939 similarity is required). Furthermore, a prefix can be specified
4940 for FuzzyQuerys so that only those terms are considered similar that
4941 start with this prefix. This can speed up FuzzyQuery greatly.
4942 (Daniel Naber, Christoph Goller)
4944 4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
4945 of relative positions. (Christoph Goller)
4947 5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
4948 (patch #9110); some unused method parameters removed; The ability
4949 to specify a minimum similarity for FuzzyQuery has been added.
4952 6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
4953 for every non-zero-scoring hit. This makes 'OR' queries that
4954 contain common terms substantially faster. (cutting)
4959 1. Fixed a performance bug in hit sorting code, where values were not
4960 correctly cached. (Aviran via cutting)
4962 2. Fixed errors in file format documentation. (Daniel Naber)
4967 1. Added "an" to the list of stop words in StopAnalyzer, to complement
4968 the existing "a" there. Fix for bug 28960
4969 (http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
4971 2. Added new class FieldCache to manage in-memory caches of field term
4974 3. Added overloaded getFieldQuery method to QueryParser which
4975 accepts the slop factor specified for the phrase (or the default
4976 phrase slop for the QueryParser instance). This allows overriding
4977 methods to replace a PhraseQuery with a SpanNearQuery instead,
4978 keeping the proper slop factor. (Erik Hatcher)
4980 4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
4981 UTF-8 and changed the build encoding to UTF-8, to make changed files
4982 compile. (Otis Gospodnetic)
4984 5. Removed synchronization from term lookup under IndexReader methods
4985 termFreq(), termDocs() or termPositions() to improve
4986 multi-threaded performance. (cutting)
4988 6. Fix a bug where obsolete segment files were not deleted on Win32.
4993 1. Fixed several search bugs introduced by the skipTo() changes in
4994 release 1.4RC1. The index file format was changed a bit, so
4995 collections must be re-indexed to take advantage of the skipTo()
4996 optimizations. (Christoph Goller)
4998 2. Added new Document methods, removeField() and removeFields().
5001 3. Fixed inconsistencies with index closing. Indexes and directories
5002 are now only closed automatically by Lucene when Lucene opened
5003 them automatically. (Christoph Goller)
5005 4. Added new class: FilteredQuery. (Tim Jones)
5007 5. Added a new SortField type for custom comparators. (Tim Jones)
5009 6. Lock obtain timed out message now displays the full path to the lock
5010 file. (Daniel Naber via Erik)
5012 7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
5014 8. Fixed so that FSDirectory's locks still work when the
5015 java.io.tmpdir system property is null. (cutting)
5017 9. Changed FilteredTermEnum's constructor to take no parameters,
5018 as the parameters were ignored anyway (bug #28858)
5022 1. GermanAnalyzer now throws an exception if the stopword file
5023 cannot be found (bug #27987). It now uses LowerCaseFilter
5024 (bug #18410) (Daniel Naber via Otis, Erik)
5026 2. Fixed a few bugs in the file format documentation. (cutting)
5031 1. Changed the format of the .tis file, so that:
5033 - it has a format version number, which makes it easier to
5034 back-compatibly change file formats in the future.
5036 - the term count is now stored as a long. This was the one aspect
5037 of the Lucene's file formats which limited index size.
5039 - a few internal index parameters are now stored in the index, so
5040 that they can (in theory) now be changed from index to index,
5041 although there is not yet an API to do so.
5043 These changes are back compatible. The new code can read old
5044 indexes. But old code will not be able read new indexes. (cutting)
5046 2. Added an optimized implementation of TermDocs.skipTo(). A skip
5047 table is now stored for each term in the .frq file. This only
5048 adds a percent or two to overall index size, but can substantially
5049 speedup many searches. (cutting)
5051 3. Restructured the Scorer API and all Scorer implementations to take
5052 advantage of an optimized TermDocs.skipTo() implementation. In
5053 particular, PhraseQuerys and conjunctive BooleanQuerys are
5054 faster when one clause has substantially fewer matches than the
5055 others. (A conjunctive BooleanQuery is a BooleanQuery where all
5056 clauses are required.) (cutting)
5058 4. Added new class ParallelMultiSearcher. Combined with
5059 RemoteSearchable this makes it easy to implement distributed
5060 search systems. (Jean-Francois Halleux via cutting)
5062 5. Added support for hit sorting. Results may now be sorted by any
5063 indexed field. For details see the javadoc for
5064 Searcher#search(Query, Sort). (Tim Jones via Cutting)
5066 6. Changed FSDirectory to auto-create a full directory tree that it
5067 needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
5069 7. Added a new span-based query API. This implements, among other
5070 things, nested phrases. See javadocs for details. (Doug Cutting)
5072 8. Added new method Query.getSimilarity(Searcher), and changed
5073 scorers to use it. This permits one to subclass a Query class so
5074 that it can specify its own Similarity implementation, perhaps
5075 one that delegates through that of the Searcher. (Julien Nioche
5078 9. Added MultiReader, an IndexReader that combines multiple other
5079 IndexReaders. (Cutting)
5081 10. Added support for term vectors. See Field#isTermVectorStored().
5082 (Grant Ingersoll, Cutting & Dmitry)
5084 11. Fixed the old bug with escaping of special characters in query
5085 strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
5086 (Jean-Francois Halleux via Otis)
5088 12. Added support for overriding default values for the following,
5089 using system properties:
5090 - default commit lock timeout
5091 - default maxFieldLength
5092 - default maxMergeDocs
5093 - default mergeFactor
5094 - default minMergeDocs
5095 - default write lock timeout
5098 13. Changed QueryParser.jj to allow '-' and '+' within tokens:
5099 http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
5100 (Morus Walter via Otis)
5102 14. Changed so that the compound index format is used by default.
5103 This makes indexing a bit slower, but vastly reduces the chances
5104 of file handle problems. (Cutting)
5109 1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
5110 throw ParseException instead. (Erik Hatcher)
5112 2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
5114 3. Added a new method IndexReader.setNorm(), that permits one to
5115 alter the boosting of fields after an index is created.
5117 4. Distinguish between the final position and length when indexing a
5118 field. The length is now defined as the total number of tokens,
5119 instead of the final position, as it was previously. Length is
5120 used for score normalization (Similarity.lengthNorm()) and for
5121 controlling memory usage (IndexWriter.maxFieldLength). In both of
5122 these cases, the total number of tokens is a better value to use
5123 than the final token position. Position is used in phrase
5124 searching (see PhraseQuery and Token.setPositionIncrement()).
5126 5. Fix StandardTokenizer's handling of CJK characters (Chinese,
5127 Japanese and Korean ideograms). Previously contiguous sequences
5128 were combined in a single token, which is not very useful. Now
5129 each ideogram generates a separate token, which is more useful.
5134 1. Added minMergeDocs in IndexWriter. This can be raised to speed
5135 indexing without altering the number of files, but only using more
5136 memory. (Julien Nioche via Otis)
5138 2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
5140 3. Fix bug #16952, in demo HTML parser, skip comments in
5141 javascript. (Christoph Goller)
5143 4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
5144 output (Daniel Naber via Christoph Goller)
5146 5. Fix bug #24301, in demo HTML parser, long titles no longer
5147 hang things. (Christoph Goller)
5149 6. Fix bug #23534, Replace use of file timestamp of segments file
5150 with an index version number stored in the segments file. This
5151 resolves problems when running on file systems with low-resolution
5152 timestamps, e.g., HFS under MacOS X. (Christoph Goller)
5154 7. Fix QueryParser so that TokenMgrError is not thrown, only
5155 ParseException. (Erik Hatcher)
5157 8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
5159 9. Fixed a problem compiling TestRussianStem. (Christoph Goller)
5161 10. Cleaned up some build stuff. (Erik Hatcher)
5166 1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
5167 SegmentsReader. (Julien Nioche via otis)
5169 2. Changed file locking to place lock files in
5170 System.getProperty("java.io.tmpdir"), where all users are
5171 permitted to write files. This way folks can open and correctly
5172 lock indexes which are read-only to them.
5174 3. IndexWriter: added a new method, addDocument(Document, Analyzer),
5175 permitting one to easily use different analyzers for different
5176 documents in the same index.
5178 4. Minor enhancements to FuzzyTermEnum.
5179 (Christoph Goller via Otis)
5181 5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
5182 and MultiIndexSearcher to use it.
5183 (Christoph Goller via Otis)
5185 6. Fixed a bug in IndexWriter that returned incorrect docCount().
5186 (Christoph Goller via Otis)
5188 7. Fixed SegmentsReader to eliminate the confusing and slightly different
5189 behaviour of TermEnum when dealing with an enumeration of all terms,
5190 versus an enumeration starting from a specific term.
5191 This patch also fixes incorrect term document frequencies when the same term
5192 is present in multiple segments.
5193 (Christoph Goller via Otis)
5195 8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
5197 9. Added support for the new "compound file" index format (Dmitry
5200 10. Added Locale setting to QueryParser, for use by date range parsing.
5202 11. Changed IndexReader so that it can be subclassed by classes
5203 outside of its package. Previously it had package-private
5204 abstract methods. Also modified the index merging code so that it
5205 can work on an arbitrary IndexReader implementation, and added a
5206 new method, IndexWriter.addIndexes(IndexReader[]), to take
5207 advantage of this. (cutting)
5209 12. Added a limit to the number of clauses which may be added to a
5210 BooleanQuery. The default limit is 1024 clauses. This should
5211 stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
5212 queries which run amok. (cutting)
5214 13. Add new method: IndexReader.undeleteAll(). This undeletes all
5215 deleted documents which still remain in the index. (cutting)
5220 1. Fixed PriorityQueue's clear() method.
5221 Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
5222 (Matthijs Bomhoff via otis)
5224 2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
5225 Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
5226 (Dale Anson via otis)
5228 3. Added the ability to disable lock creation by using disableLuceneLocks
5229 system property. This is useful for read-only media, such as CD-ROMs.
5232 4. Added id method to Hits to be able to access the index global id.
5233 Required for sorting options.
5236 5. Added support for new range query syntax to QueryParser.jj.
5239 6. Added the ability to retrieve HTML documents' META tag values to
5241 (Mark Harwood via otis)
5243 7. Modified QueryParser to make it possible to programmatically specify the
5244 default Boolean operator (OR or AND).
5245 (Péter Halácsy via otis)
5247 8. Made many search methods and classes non-final, per requests.
5248 This includes IndexWriter and IndexSearcher, among others.
5251 9. Added class RemoteSearchable, providing support for remote
5252 searching via RMI. The test class RemoteSearchableTest.java
5253 provides an example of how this can be used. (cutting)
5255 10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
5256 test class TestPhrasePrefixQuery provides the usage example.
5257 (Anders Nielsen via otis)
5259 11. Changed the German stemming algorithm to ignore case while
5260 stripping. The new algorithm is faster and produces more equal
5261 stems from nouns and verbs derived from the same word.
5264 12. Added support for boosting the score of documents and fields via
5265 the new methods Document.setBoost(float) and Field.setBoost(float).
5267 Note: This changes the encoding of an indexed value. Indexes
5268 should be re-created from scratch in order for search scores to
5269 be correct. With the new code and an old index, searches will
5270 yield very large scores for shorter fields, and very small scores
5271 for longer fields. Once the index is re-created, scores will be
5272 as before. (cutting)
5274 13. Added new method Token.setPositionIncrement().
5276 This permits, for the purpose of phrase searching, placing
5277 multiple terms in a single position. This is useful with
5278 stemmers that produce multiple possible stems for a word.
5280 This also permits the introduction of gaps between terms, so that
5281 terms which are adjacent in a token stream will not be matched by
5282 and exact phrase query. This makes it possible, e.g., to build
5283 an analyzer where phrases are not matched over stop words which
5286 Finally, repeating a token with an increment of zero can also be
5287 used to boost scores of matches on that token. (cutting)
5289 14. Added new Filter class, QueryFilter. This constrains search
5290 results to only match those which also match a provided query.
5291 Results are cached, so that searches after the first on the same
5292 index using this filter are very fast.
5294 This could be used, for example, with a RangeQuery on a formatted
5295 date field to implement date filtering. One could re-use a
5296 single QueryFilter that matches, e.g., only documents modified
5297 within the last week. The QueryFilter and RangeQuery would only
5298 need to be reconstructed once per day. (cutting)
5300 15. Added a new IndexWriter method, getAnalyzer(). This returns the
5301 analyzer used when adding documents to this index. (cutting)
5303 16. Fixed a bug with IndexReader.lastModified(). Before, document
5304 deletion did not update this. Now it does. (cutting)
5306 17. Added Russian Analyzer.
5307 (Boris Okner via otis)
5309 18. Added a public, extensible scoring API. For details, see the
5310 javadoc for org.apache.lucene.search.Similarity.
5312 19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
5314 20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
5315 (Peter Mularien via otis)
5317 21. Added getFields(String) and getValues(String) methods.
5318 Contributed by Rasik Pandey on 2002-10-09
5319 (Rasik Pandey via otis)
5321 22. Revised internal search APIs. Changes include:
5323 a. Queries are no longer modified during a search. This makes
5324 it possible, e.g., to reuse the same query instance with
5325 multiple indexes from multiple threads.
5327 b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
5328 etc.) now work correctly with MultiSearcher, fixing bugs 12619
5331 c. Boosting BooleanQuery's now works, and is supported by the
5332 query parser (problem reported by Lee Mallabone). Thus a query
5333 like "(+foo +bar)^2 +baz" is now supported and equivalent to
5334 "(+foo^2 +bar^2) +baz".
5336 d. New method: Query.rewrite(IndexReader). This permits a
5337 query to re-write itself as an alternate, more primitive query.
5338 Most of the term-expanding query classes (PrefixQuery,
5339 WildcardQuery, etc.) are now implemented using this method.
5341 e. New method: Searchable.explain(Query q, int doc). This
5342 returns an Explanation instance that describes how a particular
5343 document is scored against a query. An explanation can be
5344 displayed as either plain text, with the toString() method, or
5345 as HTML, with the toHtml() method. Note that computing an
5346 explanation is as expensive as executing the query over the
5347 entire index. This is intended to be used in developing
5348 Similarity implementations, and, for good performance, should
5349 not be displayed with every hit.
5351 f. Scorer and Weight are public, not package protected. It now
5352 possible for someone to write a Scorer implementation that is
5353 not in the org.apache.lucene.search package. This is still
5354 fairly advanced programming, and I don't expect anyone to do
5355 this anytime soon, but at least now it is possible.
5357 g. Added public accessors to the primitive query classes
5358 (TermQuery, PhraseQuery and BooleanQuery), permitting access to
5359 their terms and clauses.
5361 Caution: These are extensive changes and they have not yet been
5362 tested extensively. Bug reports are appreciated.
5365 23. Added convenience RAMDirectory constructors taking File and String
5366 arguments, for easy FSDirectory to RAMDirectory conversion.
5369 24. Added code for manual renaming of files in FSDirectory, since it
5370 has been reported that java.io.File's renameTo(File) method sometimes
5371 fails on Windows JVMs.
5372 (Matt Tucker via otis)
5374 25. Refactored QueryParser to make it easier for people to extend it.
5375 Added the ability to automatically lower-case Wildcard terms in
5377 (Tatu Saloranta via otis)
5382 1. Changed QueryParser.jj to have "?" be a special character which
5383 allowed it to be used as a wildcard term. Updated TestWildcard
5384 unit test also. (Ralf Hettesheimer via carlson)
5388 1. Renamed build.properties to default.properties and updated
5389 the BUILD.txt document to describe how to override the
5390 default.property settings without having to edit the file. This
5391 brings the build process closer to Scarab's build process.
5394 2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
5396 3. Updated "powered by" links. (otis)
5398 4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
5400 5. Added throwing exception if FSDirectory could not create directory
5401 - Bug #6914 (Eugene Gluzberg via otis)
5403 6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
5404 LowerCaseTokenizer javadoc (otis)
5406 7. Added fix to avoid NullPointerException in results.jsp
5407 (Mark Hayes via otis)
5409 8. Changed Wildcard search to find 0 or more char instead of 1 or more
5410 (Lee Mallobone, via otis)
5412 9. Fixed error in offset issue in GermanStemFilter - Bug #7412
5413 (Rodrigo Reyes, via otis)
5415 10. Added unit tests for wildcard search and DateFilter (otis)
5417 11. Allow co-existence of indexed and non-indexed fields with the same name
5418 (cutting/casper, via otis)
5420 12. Add escape character to query parser.
5423 13. Applied a patch that ensures that searches that use DateFilter
5424 don't throw an exception when no matches are found. (David Smiley, via
5427 14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
5432 1. Updated contributions section of website.
5433 Add XML Document #3 implementation to Document Section.
5434 Also added Term Highlighting to Misc Section. (carlson)
5436 2. Fixed NullPointerException for phrase searches containing
5437 unindexed terms, introduced in 1.2RC3. (cutting)
5439 3. Changed document deletion code to obtain the index write lock,
5440 enforcing the fact that document addition and deletion cannot be
5441 performed concurrently. (cutting)
5443 4. Various documentation cleanups. (otis, acoliver)
5445 5. Updated "powered by" links. (cutting, jon)
5447 6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
5449 7. Changed Term and Query to implement Serializable. (scottganyo)
5451 8. Fixed to never delete indexes added with IndexWriter.addIndexes().
5454 9. Upgraded to JUnit 3.7. (otis)
5458 1. IndexWriter: fixed a bug where adding an optimized index to an
5459 empty index failed. This was encountered using addIndexes to copy
5460 a RAMDirectory index to an FSDirectory.
5462 2. RAMDirectory: fixed a bug where RAMInputStream could not read
5463 across more than across a single buffer boundary.
5465 3. Fix query parser so it accepts queries with unicode characters.
5468 4. Fix query parser so that PrefixQuery is used in preference to
5469 WildcardQuery when there's only an asterisk at the end of the
5470 term. Previously PrefixQuery would never be used.
5472 5. Fix tests so they compile; fix ant file so it compiles tests
5473 properly. Added test cases for Analyzers and PriorityQueue.
5475 6. Updated demos, added Getting Started documentation. (acoliver)
5477 7. Added 'contributions' section to website & docs. (carlson)
5479 8. Removed JavaCC from source distribution for copyright reasons.
5480 Folks must now download this separately from metamata in order to
5481 compile Lucene. (cutting)
5483 9. Substantially improved the performance of DateFilter by adding the
5484 ability to reuse TermDocs objects. (cutting)
5486 10. Added IndexReader methods:
5487 public static boolean indexExists(String directory);
5488 public static boolean indexExists(File directory);
5489 public static boolean indexExists(Directory directory);
5490 public static boolean isLocked(Directory directory);
5491 public static void unlock(Directory directory);
5494 11. Fixed bugs in GermanAnalyzer (gschwarz)
5498 - added sources to distribution
5499 - removed broken build scripts and libraries from distribution
5500 - SegmentsReader: fixed potential race condition
5501 - FSDirectory: fixed so that getDirectory(xxx,true) correctly
5502 erases the directory contents, even when the directory
5503 has already been accessed in this JVM.
5504 - RangeQuery: Fix issue where an inclusive range query would
5505 include the nearest term in the index above a non-existant
5506 specified upper term.
5507 - SegmentTermEnum: Fix NullPointerException in clone() method
5508 when the Term is null.
5509 - JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
5510 since they rely on a feature added in JDK 1.2.
5512 1.2 RC1 (first Apache release):
5513 - packages renamed from com.lucene to org.apache.lucene
5514 - license switched from LGPL to Apache
5515 - ant-only build -- no more makefiles
5516 - addition of lock files--now fully thread & process safe
5517 - addition of German stemmer
5518 - MultiSearcher now supports low-level search API
5519 - added RangeQuery, for term-range searching
5520 - Analyzers can choose tokenizer based on field name
5523 1.01b (last Sourceforge release)
5526 . new prefix query (search for "foo*" matches "food")
5530 This release fixes a few serious bugs and also includes some
5531 performance optimizations, a stemmer, and a few other minor
5536 Lucene now includes a grammar-based tokenizer, StandardTokenizer.
5538 The only tokenizer included in the previous release (LetterTokenizer)
5539 identified terms consisting entirely of alphabetic characters. The
5540 new tokenizer uses a regular-expression grammar to identify more
5541 complex classes of terms, including numbers, acronyms, email
5544 StandardTokenizer serves two purposes:
5546 1. It is a much better, general purpose tokenizer for use by
5549 The easiest way for applications to start using
5550 StandardTokenizer is to use StandardAnalyzer.
5552 2. It provides a good example of grammar-based tokenization.
5554 If an application has special tokenization requirements, it can
5555 implement a custom tokenizer by copying the directory containing
5556 the new tokenizer into the application and modifying it
5561 First open source release.
5563 The code has been re-organized into a new package and directory
5564 structure for this release. It builds OK, but has not been tested
5565 beyond that since the re-organization.