3 For more information on past and future Lucene versions, please see:
4 http://s.apache.org/luceneversions
6 ======================= Lucene 3.5.0 =======================
8 Changes in backwards compatibility policy
10 * LUCENE-3390: The first approach in Lucene 3.4.0 for missing values
11 support for sorting had a design problem that made the missing value
12 be populated directly into the FieldCache arrays during sorting,
13 leading to concurrency issues. To fix this behaviour, the method
14 signatures had to be changed:
15 - FieldCache.getUnValuedDocs() was renamed to FieldCache.getDocsWithField()
16 returning a Bits interface (backported from Lucene 4.0).
17 - FieldComparator.setMissingValue() was removed and added to
19 As this is expert API, most code will not be affected.
20 (Uwe Schindler, Doron Cohen, Mike McCandless)
22 * LUCENE-3464: IndexReader.reopen has been renamed to
23 IndexReader.openIfChanged (a static method), and now returns null
24 (instead of the old reader) if there are no changes in the index, to
25 prevent the common pitfall of accidentally closing the old reader.
27 * LUCENE-3541: Remove IndexInput's protected copyBuf. If you want to
28 keep a buffer in your IndexInput, do this yourself in your implementation,
29 and be sure to do the right thing on clone()! (Robert Muir)
31 * LUCENE-2822: TimeLimitingCollector now expects a counter clock instead of
32 relying on a private daemon thread. The global time limiting clock thread
33 has been exposed and is now lazily loaded and fully optional.
34 TimeLimitingCollector now supports setting clock baseline manually to include
35 prelude of a search. Previous versions set the baseline on construction time,
36 now baseline is set once the first IndexReader is passed to the collector
37 unless set before. (Simon Willnauer)
39 Changes in runtime behavior
41 * LUCENE-3520: IndexReader.openIfChanged, when passed a near-real-time
42 reader, will now return null if there are no changes. The API has
43 always reserved the right to do this; it's just that in the past for
44 near-real-time readers it never did. (Mike McCandless)
48 * SOLR-2762: (backport form 4.x line): FSTLookup could return duplicate
49 results or one results less than requested. (David Smiley, Dawid Weiss)
51 * LUCENE-3412: SloppyPhraseScorer was returning non-deterministic results
52 for queries with many repeats (Doron Cohen)
54 * LUCENE-3421: PayloadTermQuery's explain was wrong when includeSpanScore=false.
55 (Edward Drapkin via Robert Muir)
57 * LUCENE-3432: IndexWriter.expungeDeletes with TieredMergePolicy
58 should ignore the maxMergedSegmentMB setting (v.sevel via Mike
61 * LUCENE-3442: TermQuery.TermWeight.scorer() returns null for non-atomic
62 IndexReaders (optimization bug, introcuced by LUCENE-2829), preventing
63 QueryWrapperFilter and similar classes to get a top-level DocIdSet.
64 (Dan C., Uwe Schindler)
66 * LUCENE-3390: Corrected handling of missing values when two parallel searches
67 using different missing values for sorting: the missing value was populated
68 directly into the FieldCache arrays during sorting, leading to concurrency
69 issues. (Uwe Schindler, Doron Cohen, Mike McCandless)
71 * LUCENE-3439: Closing an NRT reader after the writer was closed was
72 incorrectly invoking the DeletionPolicy and (then possibly deleting
73 files) on the closed IndexWriter (Robert Muir, Mike McCandless)
75 * LUCENE-3215: SloppyPhraseScorer sometimes computed Infinite freq
76 (Robert Muir, Doron Cohen)
78 * LUCENE-3465: IndexSearcher with ExecutorService was always passing 0
79 for docBase to Collector.setNextReader. (Robert Muir, Mike
82 * LUCENE-3503: DisjunctionSumScorer would give slightly different scores
83 for a document depending if you used nextDoc() versus advance().
84 (Mike McCandless, Robert Muir)
86 * LUCENE-3529: Properly support indexing an empty field with empty term text.
87 Previously, if you had assertions enabled you would receive an error during
88 flush, if you didn't, you would get an invalid index.
89 (Mike McCandless, Robert Muir)
91 * LUCENE-2633: PackedInts Packed32 and Packed64 did not support internal
92 structures larger than 256MB (Toke Eskildsen via Mike McCandless)
94 * LUCENE-3540: LUCENE-3255 dropped support for pre-1.9 indexes, but the
95 error message in IndexFormatTooOldException was incorrect. (Uwe Schindler,
98 * LUCENE-3541: IndexInput's default copyBytes() implementation was not safe
99 across multiple threads, because all clones shared the same buffer.
102 * LUCENE-3548: Fix CharsRef#append to extend length of the existing char[]
103 and preserve existing chars. (Simon Willnauer)
105 * LUCENE-3582: Normalize NaN values in NumericUtils.floatToSortableInt() /
106 NumericUtils.doubleToSortableLong(), so this is consistent with stored
107 fields. Also fix NumericRangeQuery to not falsely hit NaNs on half-open
108 ranges (one bound is null). Because of normalization, NumericRangeQuery
109 can now be used to hit NaN values by creating a query with
110 upper == lower == NaN (inclusive). (Dawid Weiss, Uwe Schindler)
114 * LUCENE-3454: Rename IndexWriter.optimize to forceMerge to discourage
115 use of this method since it is horribly costly and rarely justified
116 anymore. MergePolicy.findMergesForOptimize was renamed to
117 findForcedMerges. IndexReader.isOptimized was
118 deprecated. IndexCommit.isOptimized was replaced with
119 getSegmentCount. (Robert Muir, Mike McCandless)
121 * LUCENE-3205: Deprecated MultiTermQuery.getTotalNumerOfTerms() [and
122 related methods], as the numbers returned are not useful
123 for multi-segment indexes. They were only needed for tests of
124 NumericRangeQuery. (Mike McCandless, Uwe Schindler)
126 * LUCENE-3574: Deprecate outdated constants in org.apache.lucene.util.Constants
127 and add new ones for Java 6 and Java 7. (Uwe Schindler)
129 * LUCENE-3571: Deprecate IndexSearcher(Directory). Use the constructors
130 that take IndexReader instead. (Robert Muir)
132 * LUCENE-3577: Rename IndexWriter.expungeDeletes to forceMergeDeletes,
133 and revamped the javadocs, to discourage
134 use of this method since it is horribly costly and rarely
135 justified. MergePolicy.findMergesToExpungeDeletes was renamed to
136 findForcedDeletesMerges. (Robert Muir, Mike McCandless)
140 * LUCENE-3448: Added FixedBitSet.and(other/DISI), andNot(other/DISI).
143 * LUCENE-2215: Added IndexSearcher.searchAfter which returns results after a
144 specified ScoreDoc (e.g. last document on the previous page) to support deep
145 paging use cases. (Aaron McCurry, Grant Ingersoll, Robert Muir)
147 * LUCENE-1990: Adds internal packed ints implementation, to be used
148 for more efficient storage of int arrays when the values are
149 bounded, for example for storing the terms dict index (Toke
150 Eskildsen via Mike McCandless)
152 * LUCENE-3558: Moved SearcherManager, NRTManager & SearcherLifetimeManager into
153 core. All classes are contained in o.a.l.search. (Simon Willnauer)
157 * LUCENE-3426: Add NGramPhraseQuery which extends PhraseQuery and tries to
158 reduce the number of terms of the query when rewrite(), in order to improve
159 performance. (Robert Muir, Koji Sekiguchi)
161 * LUCENE-3494: Optimize FilteredQuery to remove a multiply in score()
162 (Uwe Schindler, Robert Muir)
164 * LUCENE-3534: Remove filter logic from IndexSearcher and delegate to
165 FilteredQuery's Scorer. This is a partial backport of a cleanup in
166 FilteredQuery/IndexSearcher added by LUCENE-1536 to Lucene 4.0.
169 * LUCENE-2205: Very substantial (3-5X) RAM reduction required to hold
170 the terms index on opening an IndexReader (Aaron McCurry via Mike McCandless)
172 * LUCENE-3443: FieldCache can now set docsWithField, and create an
173 array, in a single pass. This results in faster init time for apps
174 that need both (such as sorting by a field with a missing value).
179 * LUCENE-3420: Disable the finalness checks in TokenStream and Analyzer
180 for implementing subclasses in different packages, where assertions are not
181 enabled. (Uwe Schindler)
183 * LUCENE-3506: tests relying on assertions being enabled were no-op because
184 they ignored AssertionError. With this fix now entire test framework
185 (every test) fails if assertions are disabled, unless
186 -Dtests.asserts.gracious=true is specified. (Doron Cohen)
190 * SOLR-2849: Fix dependencies in Maven POMs. (David Smiley via Steve Rowe)
192 * LUCENE-3561: Fix maven xxx-src.jar files that were missing resources.
195 ======================= Lucene 3.4.0 =======================
199 * LUCENE-3251: Directory#copy failed to close target output if opening the
200 source stream failed. (Simon Willnauer)
202 * LUCENE-3255: If segments_N file is all zeros (due to file
203 corruption), don't read that to mean the index is empty. (Gregory
204 Tarr, Mark Harwood, Simon Willnauer, Mike McCandless)
206 * LUCENE-3254: Fixed minor bug in deletes were written to disk,
207 causing the file to sometimes be larger than it needed to be. (Mike
210 * LUCENE-3224: Fixed a big where CheckIndex would incorrectly report a
211 corrupt index if a term with docfreq >= 16 was indexed more than once
212 at the same position. (Robert Muir)
214 * LUCENE-3339: Fixed deadlock case when multiple threads use the new
215 block-add (IndexWriter.add/updateDocuments) methods. (Robert Muir,
218 * LUCENE-3340: Fixed case where IndexWriter was not flushing at
219 exactly maxBufferedDeleteTerms (Mike McCandless)
221 * LUCENE-3358, LUCENE-3361: StandardTokenizer and UAX29URLEmailTokenizer
222 wrongly discarded combining marks attached to Han or Hiragana characters,
223 this is fixed if you supply Version >= 3.4 If you supply a previous
224 lucene version, you get the old buggy behavior for backwards compatibility.
225 (Trejkaz, Robert Muir)
227 * LUCENE-3368: IndexWriter commits segments without applying their buffered
228 deletes when flushing concurrently. (Simon Willnauer, Mike McCandless)
230 * LUCENE-3365: Create or Append mode determined before obtaining write lock
231 can cause IndexWriter overriding an existing index.
232 (Geoff Cooney via Simon Willnauer)
234 * LUCENE-3380: Fixed a bug where FileSwitchDirectory's listAll() would wrongly
235 throw NoSuchDirectoryException when all files written so far have been
236 written to one directory, but the other still has not yet been created on the
237 filesystem. (Robert Muir)
239 * LUCENE-3402: term vectors disappeared from the index if optimize() was called
240 following addIndexes(). (Shai Erera)
242 * LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT
243 SegmentReaders, leading to unused files accumulating in the
244 Directory. (tal steier via Mike McCandless)
246 * LUCENE-3390: Added SortField.setMissingValue(v) to enable well defined
247 sorting behavior for documents that do not include the given field.
248 (Gilad Barkai via Doron Cohen)
250 * LUCENE-3418: Lucene was failing to fsync index files on commit,
251 meaning an operating system or hardware crash, or power loss, could
252 easily corrupt the index. (Mark Miller, Robert Muir, Mike
257 * LUCENE-3290: Added FieldInvertState.numUniqueTerms
258 (Mike McCandless, Robert Muir)
260 * LUCENE-3280: Add FixedBitSet, like OpenBitSet but is not elastic
261 (grow on demand if you set/get/clear too-large indices). (Mike
264 * LUCENE-2048: Added the ability to omit positions but still index
265 term frequencies, you can now control what is indexed into
266 the postings via AbstractField.setIndexOptions:
267 DOCS_ONLY: only documents are indexed: term frequencies and positions are omitted
268 DOCS_AND_FREQS: only documents and term frequencies are indexed: positions are omitted
269 DOCS_AND_FREQS_AND_POSITIONS: full postings: documents, frequencies, and positions
270 AbstractField.setOmitTermFrequenciesAndPositions is deprecated,
271 you should use DOCS_ONLY instead. (Robert Muir)
273 * LUCENE-3097: Added a new grouping collector that can be used to retrieve all most relevant
274 documents per group. This can be useful in situations when one wants to compute grouping
275 based facets / statistics on the complete query result. (Martijn van Groningen)
277 * LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
278 suppressed exceptions in the original exception, so stack trace
279 will contain them. (Uwe Schindler)
283 * LUCENE-3289: When building an FST you can now tune how aggressively
284 the FST should try to share common suffixes. Typically you can
285 greatly reduce RAM required during building, and CPU consumed, at
286 the cost of a somewhat larger FST. (Mike McCandless)
290 * LUCENE-3327: Fix AIOOBE when TestFSTs is run with
291 -Dtests.verbose=true (James Dyer via Mike McCandless)
295 * LUCENE-3406: Add ant target 'package-local-src-tgz' to Lucene and Solr
296 to package sources from the local working copy.
297 (Seung-Yeoul Yang via Steve Rowe)
300 ======================= Lucene 3.3.0 =======================
302 Changes in backwards compatibility policy
304 * LUCENE-3140: IndexOutput.copyBytes now takes a DataInput (superclass
305 of IndexInput) as its first argument. (Robert Muir, Dawid Weiss,
308 * LUCENE-3191: FieldComparator.value now returns an Object not
309 Comparable; FieldDoc.fields also changed from Comparable[] to
310 Object[] (Uwe Schindler, Mike McCandless)
312 * LUCENE-3208: Made deprecated methods Query.weight(Searcher) and
313 Searcher.createWeight() final to prevent override. If you have
314 overridden one of these methods, cut over to the non-deprecated
315 implementation. (Uwe Schindler, Robert Muir, Yonik Seeley)
317 * LUCENE-3238: Made MultiTermQuery.rewrite() final, to prevent
318 problems (such as not properly setting rewrite methods, or
319 not working correctly with things like SpanMultiTermQueryWrapper).
320 To rewrite to a simpler form, instead return a simpler enum
321 from getEnum(IndexReader). For example, to rewrite to a single term,
322 return a SingleTermEnum. (ludovic Boutros, Uwe Schindler, Robert Muir)
324 Changes in runtime behavior
326 * LUCENE-2834: the hash used to compute the lock file name when the
327 lock file is not stored in the index has changed. This means you
328 will see a different lucene-XXX-write.lock in your lock directory.
329 (Robert Muir, Uwe Schindler, Mike McCandless)
331 * LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
332 does not store norms. (Shai Erera, Mike McCandless)
334 * LUCENE-3198: On Linux, if the JRE is 64 bit and supports unmapping,
335 FSDirectory.open now defaults to MMapDirectory instead of
336 NIOFSDirectory since MMapDirectory gives better performance. (Mike
339 * LUCENE-3200: MMapDirectory now uses chunk sizes that are powers of 2.
340 When setting the chunk size, it is rounded down to the next possible
341 value. The new default value for 64 bit platforms is 2^30 (1 GiB),
342 for 32 bit platforms it stays unchanged at 2^28 (256 MiB).
343 Internally, MMapDirectory now only uses one dedicated final IndexInput
344 implementation supporting multiple chunks, which makes Hotspot's life
345 easier. (Uwe Schindler, Robert Muir, Mike McCandless)
349 * LUCENE-3147,LUCENE-3152: Fixed open file handles leaks in many places in the
350 code. Now MockDirectoryWrapper (in test-framework) tracks all open files,
351 including locks, and fails if the test fails to release all of them.
352 (Mike McCandless, Robert Muir, Shai Erera, Simon Willnauer)
354 * LUCENE-3102: CachingCollector.replay was failing to call setScorer
355 per-segment (Martijn van Groningen via Mike McCandless)
357 * LUCENE-3183: Fix rare corner case where seeking to empty term
358 (field="", term="") with terms index interval 1 could hit
359 ArrayIndexOutOfBoundsException (selckin, Robert Muir, Mike
362 * LUCENE-3208: IndexSearcher had its own private similarity field
363 and corresponding get/setter overriding Searcher's implementation. If you
364 setted a different Similarity instance on IndexSearcher, methods implemented
365 in the superclass Searcher were not using it, leading to strange bugs.
366 (Uwe Schindler, Robert Muir)
368 * LUCENE-3197: Fix core merge policies to not over-merge during
369 background optimize when documents are still being deleted
370 concurrently with the optimize (Mike McCandless)
372 * LUCENE-3222: The RAM accounting for buffered delete terms was
373 failing to measure the space required to hold the term's field and
374 text character data. (Mike McCandless)
376 * LUCENE-3238: Fixed bug where using WildcardQuery("prefix*") inside
377 of a SpanMultiTermQueryWrapper rewrote incorrectly and returned
378 an error instead. (ludovic Boutros, Uwe Schindler, Robert Muir)
382 * LUCENE-3208: Renamed protected IndexSearcher.createWeight() to expert
383 public method IndexSearcher.createNormalizedWeight() as this better describes
384 what this method does. The old method is still there for backwards
385 compatibility. Query.weight() was deprecated and simply delegates to
386 IndexSearcher. Both deprecated methods will be removed in Lucene 4.0.
387 (Uwe Schindler, Robert Muir, Yonik Seeley)
389 * LUCENE-3197: MergePolicy.findMergesForOptimize now takes
390 Map<SegmentInfo,Boolean> instead of Set<SegmentInfo> as the second
391 argument, so the merge policy knows which segments were originally
392 present vs produced by an optimizing merge (Mike McCandless)
396 * LUCENE-1736: DateTools.java general improvements.
397 (David Smiley via Steve Rowe)
401 * LUCENE-3140: Added experimental FST implementation to Lucene.
402 (Robert Muir, Dawid Weiss, Mike McCandless)
404 * LUCENE-3193: A new TwoPhaseCommitTool allows running a 2-phase commit
405 algorithm over objects that implement the new TwoPhaseCommit interface (such
406 as IndexWriter). (Shai Erera)
408 * LUCENE-3191: Added TopDocs.merge, to facilitate merging results from
409 different shards (Uwe Schindler, Mike McCandless)
411 * LUCENE-3179: Added OpenBitSet.prevSetBit (Paul Elschot via Mike McCandless)
413 * LUCENE-3210: Made TieredMergePolicy more aggressive in reclaiming
414 segments with deletions; added new methods
415 set/getReclaimDeletesWeight to control this. (Mike McCandless)
419 * LUCENE-1344: Create OSGi bundle using dev-tools/maven.
420 (Nicolas Lalevée, Luca Stancapiano via ryan)
422 * LUCENE-3204: The maven-ant-tasks jar is now included in the source tree;
423 users of the generate-maven-artifacts target no longer have to manually
424 place this jar in the Ant classpath. NOTE: when Ant looks for the
425 maven-ant-tasks jar, it looks first in its pre-existing classpath, so
426 any copies it finds will be used instead of the copy included in the
427 Lucene/Solr source tree. For this reason, it is recommeded to remove
428 any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under
429 ~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe)
432 ======================= Lucene 3.2.0 =======================
434 Changes in backwards compatibility policy
436 * LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing
437 with generics can lead to ClassCastException. For advanced use (e.g. in Solr)
438 a method getHeapArray() was added to retrieve the internal heap array as a
439 non-generic Object[]. (Uwe Schindler, Yonik Seeley)
441 * LUCENE-1076: IndexWriter.setInfoStream now throws IOException
442 (Mike McCandless, Shai Erera)
444 * LUCENE-3084: MergePolicy.OneMerge.segments was changed from
445 SegmentInfos to a List<SegmentInfo>. SegmentInfos itsself was changed
446 to no longer extend Vector<SegmentInfo> (to update code that is using
447 Vector-API, use the new asList() and asSet() methods returning unmodifiable
448 collections; modifying SegmentInfos is now only possible through
449 the explicitely declared methods). IndexWriter.segString() now takes
450 Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile
451 should fix this. MergePolicy and SegmentInfos are internal/experimental
452 APIs not covered by the strict backwards compatibility policy.
453 (Uwe Schindler, Mike McCandless)
455 Changes in runtime behavior
457 * LUCENE-3065: When a NumericField is retrieved from a Document loaded
458 from IndexReader (or IndexSearcher), it will now come back as
459 NumericField not as a Field with a string-ified version of the
460 numeric value you had indexed. Note that this only applies for
461 newly-indexed Documents; older indices will still return Field
462 with the string-ified numeric value. If you call Document.get(),
463 the value comes still back as String, but Document.getFieldable()
464 returns NumericField instances. (Uwe Schindler, Ryan McKinley,
467 * LUCENE-1076: Changed the default merge policy from
468 LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32
469 (passed to IndexWriterConfig), which is able to merge non-contiguous
470 segments. This means docIDs no longer necessarily stay "in order"
471 during indexing. If this is a problem then you can use either of
472 the LogMergePolicy impls. (Mike McCandless)
476 * LUCENE-3082: Added index upgrade tool oal.index.IndexUpgrader
477 that allows to upgrade all segments to last recent supported index
478 format without fully optimizing. (Uwe Schindler, Mike McCandless)
480 * LUCENE-1076: Added TieredMergePolicy which is able to merge non-contiguous
481 segments, which means docIDs no longer necessarily stay "in order".
482 (Mike McCandless, Shai Erera)
484 * LUCENE-3071: Adding ReversePathHierarchyTokenizer, added skip parameter to
485 PathHierarchyTokenizer (Olivier Favre via ryan)
487 * LUCENE-1421, LUCENE-3102: added CachingCollector which allow you to cache
488 document IDs and scores encountered during the search, and "replay" them to
489 another Collector. (Mike McCandless, Shai Erera)
491 * LUCENE-3112: Added experimental IndexWriter.add/updateDocuments,
492 enabling a block of documents to be indexed, atomically, with
493 guaranteed sequential docIDs. (Mike McCandless)
497 * LUCENE-3061: IndexWriter's getNextMerge() and merge(OneMerge) are now public
498 (though @lucene.experimental), allowing for custom MergeScheduler
499 implementations. (Shai Erera)
501 * LUCENE-3065: Document.getField() was deprecated, as it throws
502 ClassCastException when loading lazy fields or NumericFields.
503 (Uwe Schindler, Ryan McKinley, Mike McCandless)
505 * LUCENE-2027: Directory.touchFile is deprecated and will be removed
506 in 4.0. (Mike McCandless)
510 * LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early
511 on empty or one-element lists/arrays. (Uwe Schindler)
513 * LUCENE-2897: Apply deleted terms while flushing a segment. We still
514 buffer deleted terms to later apply to past segments. (Mike McCandless)
516 * LUCENE-3126: IndexWriter.addIndexes copies incoming segments into CFS if they
517 aren't already and MergePolicy allows that. (Shai Erera)
521 * LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new
522 indexes, causing existing deletions to be applied on the incoming indexes as
523 well. (Shai Erera, Mike McCandless)
525 * LUCENE-3024: Index with more than 2.1B terms was hitting AIOOBE when
526 seeking TermEnum (eg used by Solr's faceting) (Tom Burton-West, Mike
529 * LUCENE-3042: When a filter or consumer added Attributes to a TokenStream
530 chain after it was already (partly) consumed [or clearAttributes(),
531 captureState(), cloneAttributes(),... was called by the Tokenizer],
532 the Tokenizer calling clearAttributes() or capturing state after addition
533 may not do this on the newly added Attribute. This bug affected only
534 very special use cases of the TokenStream-API, most users would not
535 have recognized it. (Uwe Schindler, Robert Muir)
537 * LUCENE-3054: PhraseQuery can in some cases stack overflow in
538 SorterTemplate.quickSort(). This fix also adds an optimization to
539 PhraseQuery as term with lower doc freq will also have less positions.
540 (Uwe Schindler, Robert Muir, Otis Gospodnetic)
542 * LUCENE-3068: sloppy phrase query failed to match valid documents when multiple
543 query terms had same position in the query. (Doron Cohen)
545 * LUCENE-3012: Lucene writes the header now for separate norm files (*.sNNN)
550 * LUCENE-3006: Building javadocs will fail on warnings by default.
551 Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
553 * LUCENE-3128: "ant eclipse" creates a .project file for easier Eclipse
554 integration (unless one already exists). (Daniel Serodio via Shai Erera)
558 * LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to
559 stop iterating if at least 'tests.iter.min' ran and a failure occured.
560 (Shai Erera, Chris Hostetter)
562 ======================= Lucene 3.1.0 =======================
564 Changes in backwards compatibility policy
566 * LUCENE-2719: Changed API of internal utility class
567 org.apache.lucene.util.SorterTemplate to support faster quickSort using
568 pivot values and also merge sort and insertion sort. If you have used
569 this class, you have to implement two more methods for handling pivots.
570 (Uwe Schindler, Robert Muir, Mike McCandless)
572 * LUCENE-1923: Renamed SegmentInfo & SegmentInfos segString method to
573 toString. These are advanced APIs and subject to change suddenly.
574 (Tim Smith via Mike McCandless)
576 * LUCENE-2190: Removed deprecated customScore() and customExplain()
577 methods from experimental CustomScoreQuery. (Uwe Schindler)
579 * LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
580 This means that terms with a position increment gap of zero do not
581 affect the norms calculation by default. (Robert Muir)
583 * LUCENE-2320: MergePolicy.writer is now of type SetOnce, which allows setting
584 the IndexWriter for a MergePolicy exactly once. You can change references to
585 'writer' from <code>writer.doXYZ()</code> to <code>writer.get().doXYZ()</code>
586 (it is also advisable to add an <code>assert writer != null;</code> before you
587 access the wrapped IndexWriter.)
589 In addition, MergePolicy only exposes a default constructor, and the one that
590 took IndexWriter as argument has been removed from all MergePolicy extensions.
591 (Shai Erera via Mike McCandless)
593 * LUCENE-2328: SimpleFSDirectory.SimpleFSIndexInput is moved to
594 FSDirectory.FSIndexInput. Anyone extending this class will have to
595 fix their code on upgrading. (Earwin Burrfoot via Mike McCandless)
597 * LUCENE-2302: The new interface for term attributes, CharTermAttribute,
598 now implements CharSequence. This requires the toString() methods of
599 CharTermAttribute, deprecated TermAttribute, and Token to return only
600 the term text and no other attribute contents. LUCENE-2374 implements
601 an attribute reflection API to no longer rely on toString() for attribute
602 inspection. (Uwe Schindler, Robert Muir)
604 * LUCENE-2372, LUCENE-2389: StandardAnalyzer, KeywordAnalyzer,
605 PerFieldAnalyzerWrapper, WhitespaceTokenizer are now final. Also removed
606 the now obsolete and deprecated Analyzer.setOverridesTokenStreamMethod().
607 Analyzer and TokenStream base classes now have an assertion in their ctor,
608 that check subclasses to be final or at least have final implementations
609 of incrementToken(), tokenStream(), and reusableTokenStream().
610 (Uwe Schindler, Robert Muir)
612 * LUCENE-2316: Directory.fileLength contract was clarified - it returns the
613 actual file's length if the file exists, and throws FileNotFoundException
614 otherwise. Returning length=0 for a non-existent file is no longer allowed. If
615 you relied on that, make sure to catch the exception. (Shai Erera)
617 * LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
618 creation. Previously, if you passed an empty Directory and set OpenMode to
619 CREATE*, IndexWriter would make a first empty commit. If you need that
620 behavior you can call writer.commit()/close() immediately after you create it.
621 (Shai Erera, Mike McCandless)
623 * LUCENE-2733: Removed public constructors of utility classes with only static
624 methods to prevent instantiation. (Uwe Schindler)
626 * LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
627 takes deletions into account by default. You can disable this by
628 calling setCalibrateSizeByDeletes(false) on the merge policy. (Mike
631 * LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
632 values in multi-valued field has been changed for some cases in index.
633 If you index empty fields and uses positions/offsets information on that
634 fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
636 * LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
637 (Shai Erera, Robert Muir)
639 * LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
640 Searchable are collapsed into IndexSearcher; contrib/remote and
641 MultiSearcher have been removed. (Mike McCandless)
643 * LUCENE-2854: Deprecated SimilarityDelegator and
644 Similarity.lengthNorm; the latter is now final, forcing any custom
645 Similarity impls to cutover to the more general computeNorm (Robert
646 Muir, Mike McCandless)
648 * LUCENE-2869: Deprecated Query.getSimilarity: instead of using
649 "runtime" subclassing/delegation, subclass the Weight instead.
652 * LUCENE-2674: A new idfExplain method was added to Similarity, that
653 accepts an incoming docFreq. If you subclass Similarity, make sure
654 you also override this method on upgrade. (Robert Muir, Mike
657 Changes in runtime behavior
659 * LUCENE-1923: Made IndexReader.toString() produce something
660 meaningful (Tim Smith via Mike McCandless)
662 * LUCENE-2179: CharArraySet.clear() is now functional.
663 (Robert Muir, Uwe Schindler)
665 * LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index
666 before it adds the new ones. Also, the existing segments are not merged and so
667 the index will not end up with a single segment (unless it was empty before).
668 In addition, addIndexesNoOptimize was renamed to addIndexes and no longer
669 invokes a merge on the incoming and target segments, but instead copies the
670 segments to the target index. You can call maybeMerge or optimize after this
671 method completes, if you need to.
673 In addition, Directory.copyTo* were removed in favor of copy which takes the
674 target Directory, source and target files as arguments, and copies the source
675 file to the target Directory under the target file name. (Shai Erera)
677 * LUCENE-2663: IndexWriter no longer forcefully clears any existing
678 locks when create=true. This was a holdover from when
679 SimpleFSLockFactory was the default locking implementation, and,
680 even then it was dangerous since it could mask bugs in IndexWriter's
681 usage, allowing applications to accidentally open two writers on the
682 same directory. (Mike McCandless)
684 * LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on
685 LogMergePolicy now affect optimize() as well (as opposed to only regular
686 merges). This means that you can run optimize() and too large segments won't
687 be merged. (Shai Erera)
689 * LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
690 guaranteeing the commits are sorted from oldest to latest. (Shai Erera)
692 * LUCENE-2785: TopScoreDocCollector, TopFieldCollector and
693 the IndexSearcher search methods that take an int nDocs will now
694 throw IllegalArgumentException if nDocs is 0. Instead, you should
695 use the newly added TotalHitCountCollector. (Mike McCandless)
697 * LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
698 to determine whether the passed in segment should be compound.
699 (Shai Erera, Earwin Burrfoot)
701 * LUCENE-2805: IndexWriter now increments the index version on every change to
702 the index instead of for every commit. Committing or closing the IndexWriter
703 without any changes to the index will not cause any index version increment.
704 (Simon Willnauer, Mike McCandless)
706 * LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
707 Windows and Solaris systems that support unmapping, FSDirectory.open returns
708 MMapDirectory. Additionally the behavior of MMapDirectory has been
709 changed to enable unmapping by default if supported by the JRE.
710 (Mike McCandless, Uwe Schindler, Robert Muir)
712 * LUCENE-2829: Improve the performance of "primary key" lookup use
713 case (running a TermQuery that matches one document) on a
714 multi-segment index. (Robert Muir, Mike McCandless)
716 * LUCENE-2010: Segments with 100% deleted documents are now removed on
717 IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
719 * LUCENE-2960: Allow some changes to IndexWriterConfig to take effect
720 "live" (after an IW is instantiated), via
721 IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless)
725 * LUCENE-2076: Rename FSDirectory.getFile -> getDirectory. (George
726 Aroush via Mike McCandless)
728 * LUCENE-1260: Change norm encode (float->byte) and decode
729 (byte->float) to be instance methods not static methods. This way a
730 custom Similarity can alter how norms are encoded, though they must
731 still be encoded as a single byte (Johan Kindgren via Mike
734 * LUCENE-2103: NoLockFactory should have a private constructor;
735 until Lucene 4.0 the default one will be deprecated.
736 (Shai Erera via Uwe Schindler)
738 * LUCENE-2177: Deprecate the Field ctors that take byte[] and Store.
739 Since the removal of compressed fields, Store can only be YES, so
740 it's not necessary to specify. (Erik Hatcher via Mike McCandless)
742 * LUCENE-2200: Several final classes had non-overriding protected
743 members. These were converted to private and unused protected
744 constructors removed. (Steven Rowe via Robert Muir)
746 * LUCENE-2240: SimpleAnalyzer and WhitespaceAnalyzer now have
747 Version ctors. (Simon Willnauer via Uwe Schindler)
749 * LUCENE-2259: Add IndexWriter.deleteUnusedFiles, to attempt removing
750 unused files. This is only useful on Windows, which prevents
751 deletion of open files. IndexWriter will eventually remove these
752 files itself; this method just lets you do so when you know the
753 files are no longer open by IndexReaders. (luocanrao via Mike
756 * LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier
757 use by external code. In addition it offers a matchExtension method which
758 callers can use to query whether a certain file matches a certain extension.
759 (Shai Erera via Mike McCandless)
761 * LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery.
762 This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but
763 only scores terms by their boost values. For example, this can be used
764 with FuzzyQuery to ensure that exact matches are always scored higher,
765 because only the boost will be used in scoring. (Robert Muir)
767 * LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to
768 expose its folding logic. (Cédrik Lime via Robert Muir)
770 * LUCENE-2294: IndexWriter constructors have been deprecated in favor of a
771 single ctor which accepts IndexWriterConfig and a Directory. You can set all
772 the parameters related to IndexWriter on IndexWriterConfig. The different
773 setter/getter methods were deprecated as well. One should call
774 writer.getConfig().getXYZ() to query for a parameter XYZ.
775 Additionally, the setter/getter related to MergePolicy were deprecated as
776 well. One should interact with the MergePolicy directly.
777 (Shai Erera via Mike McCandless)
779 * LUCENE-2320: IndexWriter's MergePolicy configuration was moved to
780 IndexWriterConfig and the respective methods on IndexWriter were deprecated.
781 (Shai Erera via Mike McCandless)
783 * LUCENE-2328: Directory now keeps track itself of the files that are written
784 but not yet fsynced. The old Directory.sync(String file) method is deprecated
785 and replaced with Directory.sync(Collection<String> files). Take a look at
786 FSDirectory to see a sample of how such tracking might look like, if needed
787 in your custom Directories. (Earwin Burrfoot via Mike McCandless)
789 * LUCENE-2302: Deprecated TermAttribute and replaced by a new
790 CharTermAttribute. The change is backwards compatible, so
791 mixed new/old TokenStreams all work on the same char[] buffer
792 independent of which interface they use. CharTermAttribute
793 has shorter method names and implements CharSequence and
794 Appendable. This allows usage like Java's StringBuilder in
795 addition to direct char[] access. Also terms can directly be
796 used in places where CharSequence is allowed (e.g. regular
798 (Uwe Schindler, Robert Muir)
800 * LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
801 points too. If you use an IndexDeletionPolicy which holds onto index commits
802 (such as SnapshotDeletionPolicy), you can call this method to remove those
803 commit points when they are not needed anymore (instead of waiting for the
804 next commit). (Shai Erera)
806 * LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
807 with equivalent ones that take a String (id) as argument. You can pass
808 whatever ID you want, as long as you use the same one when calling both.
811 * LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
812 set what IndexWriter passes for termsIndexDivisor to the readers it
813 opens internally when apply deletions or creating a near-real-time
814 reader. (Earwin Burrfoot via Mike McCandless)
816 * LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer
817 in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
818 Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
819 points, including values from U+FFFF to U+10FFFF
821 ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
822 Analyzer implementation and behavior. Only the Unicode Basic Multilingual
823 Plane (code points from U+0000 to U+FFFF) is covered.
825 UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
826 relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
827 (Steven Rowe, Robert Muir, Uwe Schindler)
829 * LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
830 and return a different RAMFile implementation. (Shai Erera)
832 * LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
833 count the number of hits matching the query. (Mike McCandless)
835 * LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method
836 is only syntactic sugar for setNorm(int, String, byte), but using the global
837 Similarity.getDefault().encodeNormValue(). Use the byte-based method instead
838 to ensure that the norm is encoded with your Similarity.
839 (Robert Muir, Mike McCandless)
841 * LUCENE-2374: Added Attribute reflection API: It's now possible to inspect the
842 contents of AttributeImpl and AttributeSource using a well-defined API.
843 This is e.g. used by Solr's AnalysisRequestHandlers to display all attributes
845 There are also some backwards incompatible changes in toString() output,
846 as LUCENE-2302 introduced the CharSequence interface to CharTermAttribute
847 leading to changed toString() return values. The new API allows to get a
848 string representation in a well-defined way using a new method
849 reflectAsString(). For backwards compatibility reasons, when toString()
850 was implemented by implementation subclasses, the default implementation of
851 AttributeImpl.reflectWith() uses toString()s output instead to report the
852 Attribute's properties. Otherwise, reflectWith() uses Java's reflection
853 (like toString() did before) to get the attribute properties.
854 In addition, the mandatory equals() and hashCode() are no longer required
855 for AttributeImpls, but can still be provided (if needed).
858 * LUCENE-2691: Deprecate IndexWriter.getReader in favor of
859 IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless)
861 * LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity,
862 it should keep it itself. Fixed Scorers to pass their parent Weight, so that
863 Scorer.visitSubScorers (LUCENE-2590) will work correctly.
864 (Robert Muir, Doron Cohen)
866 * LUCENE-2900: When opening a near-real-time (NRT) reader
867 (IndexReader.re/open(IndexWriter)) you can now specify whether
868 deletes should be applied. Applying deletes can be costly, and some
869 expert use cases can handle seeing deleted documents returned. The
870 deletes remain buffered so that the next time you open an NRT reader
871 and pass true, all deletes will be a applied. (Mike McCandless)
873 * LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now
874 require up front specification of enablePositionIncrement. Together with
875 StopFilter they have a common base class (FilteringTokenFilter) that handles
876 the position increments automatically. Implementors only need to override an
877 accept() method that filters tokens. (Uwe Schindler, Robert Muir)
881 * LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
882 close. (Martin Traverso via Uwe Schindler)
884 * LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap
885 incorrectly and lead to ConcurrentModificationException.
886 (Uwe Schindler, Robert Muir)
888 * LUCENE-2328: Index files fsync tracking moved from
889 IndexWriter/IndexReader to Directory, and it no longer leaks memory.
890 (Earwin Burrfoot via Mike McCandless)
892 * LUCENE-2074: Reduce buffer size of lexer back to default on reset.
893 (Ruben Laguna, Shai Erera via Uwe Schindler)
895 * LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on
896 a prior (corrupt) index missing its segments_N file. (Mike
899 * LUCENE-2458: QueryParser no longer automatically forms phrase queries,
900 assuming whitespace tokenization. Previously all CJK queries, for example,
901 would be turned into phrase queries. The old behavior is preserved with
902 the matchVersion parameter for previous versions. Additionally, you can
903 explicitly enable the old behavior with setAutoGeneratePhraseQueries(true)
906 * LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in
907 OOM if a large file was copied. (Shai Erera)
909 * LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions
910 exceeds number of terms at one position (Jayendra Patil via Mike McCandless)
912 * LUCENE-2617: Optional clauses of a BooleanQuery were not factored
913 into coord if the scorer for that segment returned null. This
914 can cause the same document to score to differently depending on
915 what segment it resides in. (yonik)
917 * LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
919 * LUCENE-2732: Fix charset problems in XML loading in
920 HyphenationCompoundWordTokenFilter. (Uwe Schindler)
922 * LUCENE-2802: NRT DirectoryReader returned incorrect values from
923 getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
924 to a mutable reference to the IndexWriters SegmentInfos.
925 (Simon Willnauer, Earwin Burrfoot)
927 * LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
928 false EOF after seeking to EOF then seeking back to same block you
929 were just in and then calling readBytes (Robert Muir, Mike McCandless)
931 * LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
932 decides whether to return the cached computed size or not. (Shai Erera)
934 * LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if
935 called by multiple threads. (Alexander Kanarsky via Shai Erera)
937 * LUCENE-2809: Fixed IndexWriter.numDocs to take into account
938 applied but not yet flushed deletes. (Mike McCandless)
940 * LUCENE-2879: MultiPhraseQuery previously calculated its phrase IDF by summing
941 internally, it now calls Similarity.idfExplain(Collection, IndexSearcher).
944 * LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed.
945 (Jason Rutherglen via Shai Erera)
947 * LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round()
948 is safe also in strange locales. (Uwe Schindler)
950 * LUCENE-2891: IndexWriterConfig did not accept -1 in setReaderTermIndexDivisor,
951 which can be used to prevent loading the terms index into memory. (Shai Erera)
953 * LUCENE-2937: Encoding a float into a byte (e.g. encoding field norms during
954 indexing) had an underflow detection bug that caused floatToByte(f)==0 where
955 f was greater than 0, but slightly less than byteToFloat(1). This meant that
956 certain very small field norms (index_boost * length_norm) could have
957 been rounded down to 0 instead of being rounded up to the smallest
958 positive number. (yonik)
960 * LUCENE-2936: PhraseQuery score explanations were not correctly
961 identifying matches vs non-matches. (hossman)
963 * LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if
964 the underlying readByte() is inlined (which happens e.g. in MMapDirectory).
965 The loop was unwinded which makes the hotspot bug disappear.
966 (Uwe Schindler, Robert Muir, Mike McCandless)
970 * LUCENE-2128: Parallelized fetching document frequencies during weight
971 creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler)
973 * LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch
974 to Java 5, supplementary characters are now lowercased correctly if the
975 set is created as case insensitive.
976 CharArraySet now requires a Version argument to preserve
977 backwards compatibility. If Version < 3.1 is passed to the constructor,
978 CharArraySet yields the old behavior. (Simon Willnauer)
980 * LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch
981 to Java 5, supplementary characters are now lowercased correctly.
982 LowerCaseFilter now requires a Version argument to preserve
983 backwards compatibility. If Version < 3.1 is passed to the constructor,
984 LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir)
986 * LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer
987 that makes it easier to reuse TokenStreams correctly. This issue also added
988 StopwordAnalyzerBase, which improves consistency of all Analyzers that use
989 stopwords, and implement many analyzers in contrib with it.
990 (Simon Willnauer via Robert Muir)
992 * LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a
993 new KeywordAttribute. (Simon Willnauer, Drew Farris via Uwe Schindler)
995 * LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support
996 to CharTokenizer and its subclasses. CharTokenizer now has new
997 int-API which is conditionally preferred to the old char-API depending
998 on the provided Version. Version < 3.1 will use the char-API.
999 (Simon Willnauer via Uwe Schindler)
1001 * LUCENE-2247: Added a CharArrayMap<V> for performance improvements
1002 in some stemmers and synonym filters. (Uwe Schindler)
1004 * LUCENE-2320: Added SetOnce which wraps an object and allows it to be set
1005 exactly once. (Shai Erera via Mike McCandless)
1007 * LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that
1008 allows to use cloneAttributes() and this method as a replacement
1009 for captureState()/restoreState(), if the state itself
1010 needs to be inspected/modified. (Uwe Schindler)
1012 * LUCENE-2293: Expose control over max number of threads that
1013 IndexWriter will allow to run concurrently while indexing
1014 documents (previously this was hardwired to 5), using
1015 IndexWriterConfig.setMaxThreadStates. (Mike McCandless)
1017 * LUCENE-2297: Enable turning on reader pooling inside IndexWriter
1018 even when getReader (near-real-timer reader) is not in use, through
1019 IndexWriterConfig.enable/disableReaderPooling. (Mike McCandless)
1021 * LUCENE-2331: Add NoMergePolicy which never returns any merges to execute. In
1022 addition, add NoMergeScheduler which never executes any merges. These two are
1023 convenient classes in case you want to disable segment merges by IndexWriter
1024 without tweaking a particular MergePolicy parameters, such as mergeFactor.
1025 MergeScheduler's methods are now public. (Shai Erera via Mike McCandless)
1027 * LUCENE-2339: Deprecate static method Directory.copy in favor of
1028 Directory.copyTo, and use nio's FileChannel.transferTo when copying
1029 files between FSDirectory instances. (Earwin Burrfoot via Mike
1032 * LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
1033 matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)
1035 * LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
1036 can be used to prevent commits from ever getting deleted from the index.
1039 * LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
1040 return a DirPayloadProcessor for a given Directory, which returns a
1041 PayloadProcessor for a given Term. The PayloadProcessor will be used to
1042 process the payloads of the segments as they are merged (e.g. if one wants to
1043 rewrite payloads of external indexes as they are added, or of local ones).
1044 (Shai Erera, Michael Busch, Mike McCandless)
1046 * LUCENE-2440: Add support for custom ExecutorService in
1047 ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
1049 * LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter
1050 to wrap any other Analyzer and provide the same functionality as
1051 MaxFieldLength provided on IndexWriter. This patch also fixes a bug
1052 in the offset calculation in CharTokenizer. (Uwe Schindler, Shai Erera)
1054 * LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
1055 it's empty. (Ross Woolf via Mike McCandless)
1057 * LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
1060 * LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq. Along
1061 with a custom Collector these experimental methods make it possible
1062 to gather the hit-count per sub-clause and per document while a
1063 search is running. (Simon Willnauer, Mike McCandless)
1065 * LUCENE-2636: Added MultiCollector which allows running the search with several
1066 Collectors. (Shai Erera)
1068 * LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries
1069 to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
1070 Using this wrapper its easy to add fuzzy/wildcard to e.g. a SpanNearQuery.
1071 (Robert Muir, Uwe Schindler)
1073 * LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query
1074 instance for stripping off scores. The use of a QueryWrapperFilter
1075 is no longer needed and discouraged for that use case. Directly wrapping
1076 Query improves performance, as out-of-order collection is now supported.
1079 * LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to
1080 FieldInvertState so that it can be used in Similarity.computeNorm.
1083 * LUCENE-2720: Segments now record the code version which created them.
1084 (Shai Erera, Mike McCandless, Uwe Schindler)
1086 * LUCENE-2474: Added expert ReaderFinishedListener API to
1087 IndexReader, to allow apps that maintain external per-segment caches
1088 to evict entries when a segment is finished. (Shay Banon, Yonik
1089 Seeley, Mike McCandless)
1091 * LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and
1092 the ICUTokenizer in contrib now all tag types with a consistent set
1093 of token types (defined in StandardTokenizer). Tokens in the major
1094 CJK types are explicitly marked to allow for custom downstream handling:
1095 <IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>.
1096 (Robert Muir, Steven Rowe)
1098 * LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler)
1100 * LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields
1101 (Tim Smith, Grant Ingersoll)
1103 * LUCENE-2692: Added several new SpanQuery classes for positional checking
1104 (match is in a range, payload is a specific value) (Grant Ingersoll)
1108 * LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
1109 simple polling for results. (Edward Drapkin, Simon Willnauer)
1111 * LUCENE-2075: Terms dict cache is now shared across threads instead
1112 of being stored separately in thread local storage. Also fixed
1113 terms dict so that the cache is used when seeking the thread local
1114 term enum, which will be important for MultiTermQuery impls that do
1115 lots of seeking (Mike McCandless, Uwe Schindler, Robert Muir, Yonik
1118 * LUCENE-2136: If the multi reader (DirectoryReader or MultiReader)
1119 only has a single sub-reader, delegate all enum requests to it.
1120 This avoid the overhead of using a PQ unnecessarily. (Mike
1123 * LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin
1124 Burrfoot via Mike McCandless)
1126 * LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode
1127 into MultiTermQuery. The number of fuzzy expansions can be specified with
1128 the maxExpansions parameter to FuzzyQuery.
1129 (Uwe Schindler, Robert Muir, Mike McCandless)
1131 * LUCENE-2164: ConcurrentMergeScheduler has more control over merge
1132 threads. First, it gives smaller merges higher thread priority than
1133 larges ones. Second, a new set/getMaxMergeCount setting will pause
1134 the larger merges to allow smaller ones to finish. The defaults for
1135 these settings are now dynamic, depending the number CPU cores as
1136 reported by Runtime.getRuntime().availableProcessors() (Mike
1139 * LUCENE-2169: Improved CharArraySet.copy(), if source set is
1140 also a CharArraySet. (Simon Willnauer via Uwe Schindler)
1142 * LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
1143 directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to
1144 take advantage of this for faster performance.
1145 (Steven Rowe, Uwe Schindler, Robert Muir)
1147 * LUCENE-2188: Add a utility class for tracking deprecated overridden
1148 methods in non-final subclasses.
1149 (Uwe Schindler, Robert Muir)
1151 * LUCENE-2195: Speedup CharArraySet if set is empty.
1152 (Simon Willnauer via Robert Muir)
1154 * LUCENE-2285: Code cleanup. (Shai Erera via Uwe Schindler)
1156 * LUCENE-2303: Remove code duplication in Token class by subclassing
1157 TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve
1158 null-handling for TypeAttribute. (Uwe Schindler)
1160 * LUCENE-2329: Switch TermsHash* from using a PostingList object per unique
1161 term to parallel arrays, indexed by termID. This reduces garbage collection
1162 overhead significantly, which results in great indexing performance wins
1163 when the available JVM heap space is low. This will become even more
1164 important when the DocumentsWriter RAM buffer is searchable in the future,
1165 because then it will make sense to make the RAM buffers as large as
1166 possible. (Mike McCandless, Michael Busch)
1168 * LUCENE-2380: The terms field cache methods (getTerms,
1169 getTermsIndex), which replace the older String equivalents
1170 (getStrings, getStringIndex), consume quite a bit less RAM in most
1171 cases. (Mike McCandless)
1173 * LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
1176 * LUCENE-2531: Fix issue when sorting by a String field that was
1177 causing too many fallbacks to compare-by-value (instead of by-ord).
1180 * LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
1181 efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
1182 streams. (Shai Erera)
1184 * LUCENE-2719: Improved TermsHashPerField's sorting to use a better
1185 quick sort algorithm that dereferences the pivot element not on
1186 every compare call. Also replaced lots of sorting code in Lucene
1187 by the improved SorterTemplate class.
1188 (Uwe Schindler, Robert Muir, Mike McCandless)
1190 * LUCENE-2760: Optimize SpanFirstQuery and SpanPositionRangeQuery.
1193 * LUCENE-2770: Make SegmentMerger always work on atomic subreaders,
1194 even when IndexWriter.addIndexes(IndexReader...) is used with
1195 DirectoryReaders or other MultiReaders. This saves lots of memory
1196 during merge of norms. (Uwe Schindler, Mike McCandless)
1198 * LUCENE-2824: Optimize BufferedIndexInput to do less bounds checks.
1201 * LUCENE-2010: Segments with 100% deleted documents are now removed on
1202 IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
1204 * LUCENE-1472: Removed synchronization from static DateTools methods
1205 by using a ThreadLocal. Also converted DateTools.Resolution to a
1206 Java 5 enum (this should not break backwards). (Uwe Schindler)
1210 * LUCENE-2124: Moved the JDK-based collation support from contrib/collation
1211 into core, and moved the ICU-based collation support into contrib/icu.
1214 * LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
1215 branch is now included in the svn repository using "svn copy"
1216 after release. (Uwe Schindler)
1218 * LUCENE-2074: Regenerating StandardTokenizerImpl files now needs
1219 JFlex 1.5 (currently only available on SVN). (Uwe Schindler)
1221 * LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
1222 can force them to run sequentially by passing -Drunsequential=1 on the command
1223 line. The number of threads that are spawned per CPU defaults to '1'. If you
1224 wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
1225 (Robert Muir, Shai Erera, Peter Kofler)
1227 * LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar
1228 from tarball of previous version. Backwards tests are now packaged together
1229 with src distribution. (Uwe Schindler)
1231 * LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration:
1232 "ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
1235 * LUCENE-2657: Switch from using Maven POM templates to full POMs when
1236 generating Maven artifacts (Steven Rowe)
1238 * LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's
1239 tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera,
1244 * LUCENE-2037 Allow Junit4 tests in our environment (Erick Erickson
1245 via Mike McCandless)
1247 * LUCENE-1844: Speed up the unit tests (Mark Miller, Erick Erickson,
1250 * LUCENE-2065: Use Java 5 generics throughout our unit tests. (Kay
1251 Kay via Mike McCandless)
1253 * LUCENE-2155: Fix time and zone dependent localization test failures
1254 in queryparser tests. (Uwe Schindler, Chris Male, Robert Muir)
1256 * LUCENE-2170: Fix thread starvation problems. (Uwe Schindler)
1258 * LUCENE-2248, LUCENE-2251, LUCENE-2285: Refactor tests to not use
1259 Version.LUCENE_CURRENT, but instead use a global static value
1260 from LuceneTestCase(J4), that contains the release version.
1261 (Uwe Schindler, Simon Willnauer, Shai Erera)
1263 * LUCENE-2313, LUCENE-2322: Add VERBOSE to LuceneTestCase(J4) to control
1264 verbosity of tests. If VERBOSE==false (default) tests should not print
1265 anything other than errors to System.(out|err). The setting can be
1266 changed with -Dtests.verbose=true on test invocation.
1267 (Shai Erera, Paul Elschot, Uwe Schindler)
1269 * LUCENE-2318: Remove inconsistent system property code for retrieving
1270 temp and data directories inside test cases. It is now centralized in
1271 LuceneTestCase(J4). Also changed lots of tests to use
1272 getClass().getResourceAsStream() to retrieve test data. Tests needing
1273 access to "real" files from the test folder itself, can use
1274 LuceneTestCase(J4).getDataFile(). (Uwe Schindler)
1276 * LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such
1277 as Eclipse and IntelliJ.
1278 (Paolo Castagna, Steven Rowe via Robert Muir)
1280 * LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
1281 random. (Shai Erera, Robert Muir)
1285 * LUCENE-2579: Fix oal.search's package.html description of abstract
1286 methods. (Santiago M. Mola via Mike McCandless)
1288 * LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
1289 that the TermEnum must be seeked since it is unpositioned.
1290 (Adriano Crestani via Robert Muir)
1292 * LUCENE-2894: Use google-code-prettify for syntax highlighting in javadoc.
1293 (Shinichiro Abe, Koji Sekiguchi)
1295 ================== Release 2.9.4 / 3.0.3 ====================
1297 Changes in runtime behavior
1299 * LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a
1300 test lock just before the real lock is acquired. (Surinder Pal
1301 Singh Bindra via Mike McCandless)
1303 * LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
1304 handles against deleted files when compound-file was enabled (the
1305 default) and readers are pooled. As a result of this the peak
1306 worst-case free disk space required during optimize is now 3X the
1307 index size, when compound file is enabled (else 2X). (Mike
1310 * LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
1311 0.1), which means any time a merged segment is greater than 10% of
1312 the index size, it will be left in non-compound format even if
1313 compound format is on. This change was made to reduce peak
1314 transient disk usage during optimize which increased due to
1315 LUCENE-2762. (Mike McCandless)
1319 * LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer
1320 throws an exception when term count exceeds doc count.
1321 (Mike McCandless, Uwe Schindler)
1323 * LUCENE-2513: when opening writable IndexReader on a not-current
1324 commit, do not overwrite "future" commits. (Mike McCandless)
1326 * LUCENE-2536: IndexWriter.rollback was failing to properly rollback
1327 buffered deletions against segments that were flushed (Mark Harwood
1328 via Mike McCandless)
1330 * LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results
1331 with endpoints near Long.MIN_VALUE and Long.MAX_VALUE:
1332 NumericUtils.splitRange() overflowed, if
1333 - the range contained a LOWER bound
1334 that was greater than (Long.MAX_VALUE - (1L << precisionStep))
1335 - the range contained an UPPER bound
1336 that was less than (Long.MIN_VALUE + (1L << precisionStep))
1337 With standard precision steps around 4, this had no effect on
1338 most queries, only those that met the above conditions.
1339 Queries with large precision steps failed more easy. Queries with
1340 precision step >=64 were not affected. Also 32 bit data types int
1341 and float were not affected.
1342 (Yonik Seeley, Uwe Schindler)
1344 * LUCENE-2593: Fixed certain rare cases where a disk full could lead
1345 to a corrupted index (Robert Muir, Mike McCandless)
1347 * LUCENE-2620: Fixed a bug in WildcardQuery where too many asterisks
1348 would result in unbearably slow performance. (Nick Barkas via Robert Muir)
1350 * LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an
1351 exact multiple of the chunk size. (Robert Muir)
1353 * LUCENE-2634: isCurrent on an NRT reader was failing to return false
1354 if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless)
1356 * LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing
1357 an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir)
1359 * LUCENE-2384: Reset zzBuffer in StandardTokenizerImpl when lexer is reset.
1360 (Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074)
1362 * LUCENE-2658: Exceptions while processing term vectors enabled for multiple
1363 fields could lead to invalid ArrayIndexOutOfBoundsExceptions.
1364 (Robert Muir, Mike McCandless)
1366 * LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap().
1367 (Javier Godoy via Uwe Schindler)
1369 * LUCENE-2328: Fixed memory leak in how IndexWriter/Reader tracked
1370 already sync'd files. (Earwin Burrfoot via Mike McCandless)
1372 * LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record
1373 the absolute docid. (Uwe Schindler)
1375 * LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when
1376 primary & secondary dirs share the same underlying directory.
1377 (Michael McCandless)
1379 * LUCENE-2365: IndexWriter.newestSegment (used normally for testing)
1380 is fixed to return null if there are no segments. (Karthick
1381 Sankarachary via Mike McCandless)
1383 * LUCENE-2730: Fix two rare deadlock cases in IndexWriter (Mike McCandless)
1385 * LUCENE-2744: CheckIndex was stating total number of fields,
1386 not the number that have norms enabled, on the "test: field
1387 norms..." output. (Mark Kristensson via Mike McCandless)
1389 * LUCENE-2759: Fixed two near-real-time cases where doc store files
1390 may be opened for read even though they are still open for write.
1393 * LUCENE-2618: Fix rare thread safety issue whereby
1394 IndexWriter.optimize could sometimes return even though the index
1395 wasn't fully optimized (Mike McCandless)
1397 * LUCENE-2767: Fix thread safety issue in addIndexes(IndexReader[])
1398 that could potentially result in index corruption. (Mike
1401 * LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
1402 handles against deleted files when compound-file was enabled (the
1403 default) and readers are pooled. As a result of this the peak
1404 worst-case free disk space required during optimize is now 3X the
1405 index size, when compound file is enabled (else 2X). (Mike
1408 * LUCENE-2216: OpenBitSet.hashCode returned different hash codes for
1409 sets that only differed by trailing zeros. (Dawid Weiss, yonik)
1411 * LUCENE-2782: Fix rare potential thread hazard with
1412 IndexWriter.commit (Mike McCandless)
1416 * LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
1417 0.1), which means any time a merged segment is greater than 10% of
1418 the index size, it will be left in non-compound format even if
1419 compound format is on. This change was made to reduce peak
1420 transient disk usage during optimize which increased due to
1421 LUCENE-2762. (Mike McCandless)
1425 * LUCENE-2556: Improve memory usage after cloning TermAttribute.
1426 (Adriano Crestani via Uwe Schindler)
1428 * LUCENE-2098: Improve the performance of BaseCharFilter, especially for
1429 large documents. (Robin Wojciki, Koji Sekiguchi, Robert Muir)
1433 * LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files
1434 also in 2.9. The file format did not change, only the version number was
1435 upgraded to mark segments that have no compression. FieldsWriter still only
1436 writes 2.9 segments as they could contain compressed fields. This cross-version
1437 index format compatibility is provided here solely because Lucene 2.9 and 3.0
1438 have the same bugfix level, features, and the same index format with this slight
1439 compression difference. In general, Lucene does not support reading newer
1440 indexes with older library versions. (Uwe Schindler)
1444 * LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to
1445 Java NIO behavior when a Thread is interrupted while blocking on IO.
1446 (Simon Willnauer, Robert Muir)
1448 ================== Release 2.9.3 / 3.0.2 ====================
1450 Changes in backwards compatibility policy
1452 * LUCENE-2135: Added FieldCache.purge(IndexReader) method to the
1453 interface. Anyone implementing FieldCache externally will need to
1454 fix their code to implement this, on upgrading. (Mike McCandless)
1456 Changes in runtime behavior
1458 * LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if
1459 it cannot delete the lock file, since obtaining the lock does not fail if the
1460 file is there. (Shai Erera)
1462 * LUCENE-2060 (2.9.3 only): Changed ConcurrentMergeScheduler's default for
1463 maxNumThreads from 3 to 1, because in practice we get the most gains
1464 from running a single merge in the backround. More than one
1465 concurrent merge causes alot of thrashing (though it's possible on
1466 SSD storage that there would be net gains). (Jason Rutherglen, Mike
1471 * LUCENE-2046 (2.9.3 only): IndexReader should not see the index as changed, after
1472 IndexWriter.prepareCommit has been called but before
1473 IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
1475 * LUCENE-2119: Don't throw NegativeArraySizeException if you pass
1476 Integer.MAX_VALUE as nDocs to IndexSearcher search methods. (Paul
1477 Taylor via Mike McCandless)
1479 * LUCENE-2142: FieldCacheImpl.getStringIndex no longer throws an
1480 exception when term count exceeds doc count. (Mike McCandless)
1482 * LUCENE-2104: NativeFSLock.release() would silently fail if the lock is held by
1483 another thread/process. (Shai Erera via Uwe Schindler)
1485 * LUCENE-2283: Use shared memory pool for term vector and stored
1486 fields buffers. This memory will be reclaimed if needed according to
1487 the configured RAM Buffer Size for the IndexWriter. This also fixes
1488 potentially excessive memory usage when many threads are indexing a
1489 mix of small and large documents. (Tim Smith via Mike McCandless)
1491 * LUCENE-2300: If IndexWriter is pooling reader (because NRT reader
1492 has been obtained), and addIndexes* is run, do not pool the
1493 readers from the external directory. This is harmless (NRT reader is
1494 correct), but a waste of resources. (Mike McCandless)
1496 * LUCENE-2422: Don't reuse byte[] in IndexInput/Output -- it gains
1497 little performance, and ties up possibly large amounts of memory
1498 for apps that index large docs. (Ross Woolf via Mike McCandless)
1500 * LUCENE-2387: Don't hang onto Fieldables from the last doc indexed,
1501 in IndexWriter, nor the Reader in Tokenizer after close is
1502 called. (Ruben Laguna, Uwe Schindler, Mike McCandless)
1504 * LUCENE-2417: IndexCommit did not implement hashCode() and equals()
1505 consistently. Now they both take Directory and version into consideration. In
1506 addition, all of IndexComnmit methods which threw
1507 UnsupportedOperationException are now abstract. (Shai Erera)
1509 * LUCENE-2467: Fixed memory leaks in IndexWriter when large documents
1510 are indexed. (Mike McCandless)
1512 * LUCENE-2473: Clicking on the "More Results" link in the luceneweb.war
1513 demo resulted in ArrayIndexOutOfBoundsException.
1514 (Sami Siren via Robert Muir)
1516 * LUCENE-2476: If any exception is hit init'ing IW, release the write
1517 lock (previously we only released on IOException). (Tamas Cservenak
1518 via Mike McCandless)
1520 * LUCENE-2478: Fix CachingWrapperFilter to not throw NPE when
1521 Filter.getDocIdSet() returns null. (Uwe Schindler, Daniel Noll)
1523 * LUCENE-2468: Allow specifying how new deletions should be handled in
1524 CachingWrapperFilter and CachingSpanFilter. By default, new
1525 deletions are ignored in CachingWrapperFilter, since typically this
1526 filter is AND'd with a query that correctly takes new deletions into
1527 account. This should be a performance gain (higher cache hit rate)
1528 in apps that reopen readers, or use near-real-time reader
1529 (IndexWriter.getReader()), but may introduce invalid search results
1530 (allowing deleted docs to be returned) for certain cases, so a new
1531 expert ctor was added to CachingWrapperFilter to enforce deletions
1532 at a performance cost. CachingSpanFilter by default recaches if
1533 there are new deletions (Shay Banon via Mike McCandless)
1535 * LUCENE-2299: If you open an NRT reader while addIndexes* is running,
1536 it may miss some segments (Earwin Burrfoot via Mike McCandless)
1538 * LUCENE-2397: Don't throw NPE from SnapshotDeletionPolicy.snapshot if
1539 there are no commits yet (Shai Erera)
1541 * LUCENE-2424: Fix FieldDoc.toString to actually return its fields
1542 (Stephen Green via Mike McCandless)
1544 * LUCENE-2311: Always pass a "fully loaded" (terms index & doc stores)
1545 SegmentsReader to IndexWriter's mergedSegmentWarmer (if set), so
1546 that warming is free to do whatever it needs to. (Earwin Burrfoot
1547 via Mike McCandless)
1549 * LUCENE-3029: Fix corner case when MultiPhraseQuery is used with zero
1550 position-increment tokens that would sometimes assign different
1551 scores to identical docs. (Mike McCandless)
1553 * LUCENE-2486: Fixed intermittent FileNotFoundException on doc store
1554 files when a mergedSegmentWarmer is set on IndexWriter. (Mike
1557 * LUCENE-2130: Fix performance issue when FuzzyQuery runs on a
1558 multi-segment index (Michael McCandless)
1562 * LUCENE-2281: added doBeforeFlush to IndexWriter to allow extensions to perform
1563 operations before flush starts. Also exposed doAfterFlush as protected instead
1564 of package-private. (Shai Erera via Mike McCandless)
1566 * LUCENE-2356: Add IndexWriter.set/getReaderTermsIndexDivisor, to set
1567 what IndexWriter passes for termsIndexDivisor to the readers it
1568 opens internally when applying deletions or creating a
1569 near-real-time reader. (Earwin Burrfoot via Mike McCandless)
1573 * LUCENE-2494 (3.0.2 only): Use CompletionService in ParallelMultiSearcher
1574 instead of simple polling for results. (Edward Drapkin, Simon Willnauer)
1576 * LUCENE-2135: On IndexReader.close, forcefully evict any entries from
1577 the FieldCache rather than waiting for the WeakHashMap to release
1578 the reference (Mike McCandless)
1580 * LUCENE-2161: Improve concurrency of IndexReader, especially in the
1581 context of near real-time readers. (Mike McCandless)
1583 * LUCENE-2360: Small speedup to recycling of reused per-doc RAM in
1584 IndexWriter (Robert Muir, Mike McCandless)
1588 * LUCENE-2488 (2.9.3 only): Support build with JDK 1.4 and exclude Java 1.5
1589 contrib modules on request (pass '-Dforce.jdk14.build=true') when
1590 compiling/testing/packaging. This marks the benchmark contrib also
1591 as Java 1.5, as it depends on fast-vector-highlighter. (Uwe Schindler)
1593 ================== Release 2.9.2 / 3.0.1 ====================
1595 Changes in backwards compatibility policy
1597 * LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm
1598 from FuzzyQuery. The change was needed because the comparator of this
1599 class had to be changed in an incompatible way. The class was never
1600 intended to be public. (Uwe Schindler, Mike McCandless)
1604 * LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode
1605 and equals methods, cause bad things to happen when caching
1606 BooleanQueries. (Chris Hostetter, Mike McCandless)
1608 * LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at
1609 the same time, it's possible for commit to return control back to
1610 one of the threads before all changes are actually committed.
1611 (Sanne Grinovero via Mike McCandless)
1613 * LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser
1614 with a Version argument. (Brian Li via Robert Muir)
1616 * LUCENE-2166: Don't incorrectly keep warning about the same immense
1617 term, when IndexWriter.infoStream is on. (Mike McCandless)
1619 * LUCENE-2158: At high indexing rates, NRT reader could temporarily
1620 lose deletions. (Mike McCandless)
1622 * LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load
1623 implementation class when interface was loaded by a different
1624 class loader. (Uwe Schindler, reported on java-user by Ahmed El-dawy)
1626 * LUCENE-2257: Increase max number of unique terms in one segment to
1627 termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
1628 (Tom Burton-West via Mike McCandless)
1630 * LUCENE-2260: Fixed AttributeSource to not hold a strong
1631 reference to the Attribute/AttributeImpl classes which prevents
1632 unloading of custom attributes loaded by other classloaders
1633 (e.g. in Solr plugins). (Uwe Schindler)
1635 * LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when
1636 only one payload is present. (Erik Hatcher, Mike McCandless
1639 * LUCENE-2270: Queries consisting of all zero-boost clauses
1640 (for example, text:foo^0) sorted incorrectly and produced
1641 invalid docids. (yonik)
1645 * LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor
1646 (it was accidentally removed in 3.0.0) (Mike McCandless)
1648 * LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource
1649 (it was accidentally removed in 3.0.0) (John Wang via Uwe Schindler)
1651 * LUCENE-2190: Added a new class CustomScoreProvider to function package
1652 that can be subclassed to provide custom scoring to CustomScoreQuery.
1653 The methods in CustomScoreQuery that did this before were deprecated
1654 and replaced by a method getCustomScoreProvider(IndexReader) that
1655 returns a custom score implementation using the above class. The change
1656 is necessary with per-segment searching, as CustomScoreQuery is
1657 a stateless class (like all other Queries) and does not know about
1658 the currently searched segment. This API works similar to Filter's
1659 getDocIdSet(IndexReader). (Paul chez Jamespot via Mike McCandless,
1662 * LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant
1663 will cause backwards compatibility problems when upgrading Lucene. See
1664 the Version javadocs for additional information.
1669 * LUCENE-2086: When resolving deleted terms, do so in term sort order
1670 for better performance (Bogdan Ghidireac via Mike McCandless)
1672 * LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue
1673 added by LUCENE-504. (Uwe Schindler, Robert Muir, Mike McCandless)
1675 * LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum.
1676 (Uwe Schindler, Robert Muir)
1680 * LUCENE-2114: Change TestFilteredSearch to test on multi-segment
1681 index as well. (Simon Willnauer via Mike McCandless)
1683 * LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute
1684 that checks if clearAttributes() was called correctly.
1685 (Uwe Schindler, Robert Muir)
1687 * LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if
1688 end() is implemented correctly. (Koji Sekiguchi, Robert Muir)
1692 * LUCENE-2114: Improve javadocs of Filter to call out that the
1693 provided reader is per-segment (Simon Willnauer via Mike
1696 ======================= Release 3.0.0 =======================
1698 Changes in backwards compatibility policy
1700 * LUCENE-1979: Change return type of SnapshotDeletionPolicy#snapshot()
1701 from IndexCommitPoint to IndexCommit. Code that uses this method
1702 needs to be recompiled against Lucene 3.0 in order to work. The
1703 previously deprecated IndexCommitPoint is also removed.
1706 * o.a.l.Lock.isLocked() is now allowed to throw an IOException.
1709 * LUCENE-2030: CachingWrapperFilter and CachingSpanFilter now hide
1710 the internal cache implementation for thread safety, before it was
1711 declared protected. (Peter Lenahan, Uwe Schindler, Simon Willnauer)
1713 * LUCENE-2053: If you call Thread.interrupt() on a thread inside
1714 Lucene, Lucene will do its best to interrupt the thread. However,
1715 instead of throwing InterruptedException (which is a checked
1716 exception), you'll get an oal.util.ThreadInterruptedException (an
1717 unchecked exception, subclassing RuntimeException). The interrupt
1718 status on the thread is cleared when this exception is thrown.
1721 * LUCENE-2052: Some methods in Lucene core were changed to accept
1722 Java 5 varargs. This is not a backwards compatibility problem as
1723 long as you not try to override such a method. We left common
1724 overridden methods unchanged and added varargs to constructors,
1725 static, or final methods (MultiSearcher,...). (Uwe Schindler)
1727 * LUCENE-1558: IndexReader.open(Directory) now opens a readOnly=true
1728 reader, and new IndexSearcher(Directory) does the same. Note that
1729 this is a change in the default from 2.9, when these methods were
1730 previously deprecated. (Mike McCandless)
1732 * LUCENE-1753: Make not yet final TokenStreams final to enforce
1733 decorator pattern. (Uwe Schindler)
1735 Changes in runtime behavior
1737 * LUCENE-1677: Remove the system property to set SegmentReader class
1738 implementation. (Uwe Schindler)
1740 * LUCENE-1960: As a consequence of the removal of Field.Store.COMPRESS,
1741 support for this type of fields was removed. Lucene 3.0 is still able
1742 to read indexes with compressed fields, but as soon as merges occur
1743 or the index is optimized, all compressed fields are decompressed
1744 and converted to Field.Store.YES. Because of this, indexes with
1745 compressed fields can suddenly get larger. Also the first merge with
1746 decompression cannot be done in raw mode, it is therefore slower.
1747 This change has no effect for code that uses such old indexes,
1748 they behave as before (fields are automatically decompressed
1749 during read). Indexes converted to Lucene 3.0 format cannot be read
1750 anymore with previous versions.
1751 It is recommended to optimize your indexes after upgrading to convert
1752 to the new format and decompress all fields.
1753 If you want compressed fields, you can use CompressionTools, that
1754 creates compressed byte[] to be added as binary stored field. This
1755 cannot be done automatically, as you also have to decompress such
1756 fields when reading. You have to reindex to do that.
1757 (Michael Busch, Uwe Schindler)
1759 * LUCENE-2060: Changed ConcurrentMergeScheduler's default for
1760 maxNumThreads from 3 to 1, because in practice we get the most
1761 gains from running a single merge in the background. More than one
1762 concurrent merge causes a lot of thrashing (though it's possible on
1763 SSD storage that there would be net gains). (Jason Rutherglen,
1768 * LUCENE-1257, LUCENE-1984, LUCENE-1985, LUCENE-2057, LUCENE-1833, LUCENE-2012,
1769 LUCENE-1998: Port to Java 1.5:
1771 - Add generics to public and internal APIs (see below).
1772 - Replace new Integer(int), new Double(double),... by static valueOf() calls.
1773 - Replace for-loops with Iterator by foreach loops.
1774 - Replace StringBuffer with StringBuilder.
1775 - Replace o.a.l.util.Parameter by Java 5 enums (see below).
1776 - Add @Override annotations.
1777 (Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera,
1780 * Generify Lucene API:
1782 - TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an
1783 instance of the requested attribute interface and no cast needed anymore
1785 - NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter
1786 now have Integer, Long, Float, Double as type param (LUCENE-1857).
1787 - Document.getFields() returns List<Fieldable>.
1788 - Query.extractTerms(Set<Term>)
1789 - CharArraySet and stop word sets in core/contrib
1790 - PriorityQueue (LUCENE-1935)
1792 - DisjunctionMaxQuery (LUCENE-1984)
1793 - MultiTermQueryWrapperFilter
1794 - CloseableThreadLocal
1796 - o.a.l.util.cache package
1797 - lot's of internal APIs of IndexWriter
1798 (Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani)
1800 * LUCENE-1944, LUCENE-1856, LUCENE-1957, LUCENE-1960, LUCENE-1961,
1801 LUCENE-1968, LUCENE-1970, LUCENE-1946, LUCENE-1971, LUCENE-1975,
1802 LUCENE-1972, LUCENE-1978, LUCENE-944, LUCENE-1979, LUCENE-1973, LUCENE-2011:
1803 Remove deprecated methods/constructors/classes:
1805 - Remove all String/File directory paths in IndexReader /
1806 IndexSearcher / IndexWriter.
1807 - Remove FSDirectory.getDirectory()
1808 - Make FSDirectory abstract.
1809 - Remove Field.Store.COMPRESS (see above).
1810 - Remove Filter.bits(IndexReader) method and make
1811 Filter.getDocIdSet(IndexReader) abstract.
1812 - Remove old DocIdSetIterator methods and make the new ones abstract.
1813 - Remove some methods in PriorityQueue.
1814 - Remove old TokenStream API and backwards compatibility layer.
1815 - Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery.
1816 - Remove SpanQuery.getTerms().
1817 - Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO.
1818 - Remove old-style custom sort.
1819 - Remove legacy search setting in SortField.
1820 - Remove Hits and all references from core and contrib.
1821 - Remove HitCollector and its TopDocs support implementations.
1822 - Remove term field and accessors in MultiTermQuery
1823 (and fix Highlighter).
1824 - Remove deprecated methods in BooleanQuery.
1825 - Remove deprecated methods in Similarity.
1826 - Remove BoostingTermQuery.
1827 - Remove MultiValueSource.
1828 - Remove Scorer.explain(int).
1829 ...and some other minor ones (Uwe Schindler, Michael Busch, Mark Miller)
1831 * LUCENE-1925: Make IndexSearcher's subReaders and docStarts members
1832 protected; add expert ctor to directly specify reader, subReaders
1833 and docStarts. (John Wang, Tim Smith via Mike McCandless)
1835 * LUCENE-1945: All public classes that have a close() method now
1836 also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...).
1839 * LUCENE-1998: Change all Parameter instances to Java 5 enums. This
1840 is no backwards-break, only a change of the super class. Parameter
1841 was deprecated and will be removed in a later version.
1842 (DM Smith, Uwe Schindler)
1846 * LUCENE-1951: When the text provided to WildcardQuery has no wildcard
1847 characters (ie matches a single term), don't lose the boost and
1848 rewrite method settings. Also, rewrite to PrefixQuery if the
1849 wildcard is form "foo*", for slightly faster performance. (Robert
1850 Muir via Mike McCandless)
1852 * LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
1853 (Benjamin Keil via Mark Miller)
1855 * LUCENE-2088: addAttribute() should only accept interfaces that
1856 extend Attribute. (Shai Erera, Uwe Schindler)
1858 * LUCENE-2045: Fix silly FileNotFoundException hit if you enable
1859 infoStream on IndexWriter and then add an empty document and commit
1860 (Shai Erera via Mike McCandless)
1862 * LUCENE-2046: IndexReader should not see the index as changed, after
1863 IndexWriter.prepareCommit has been called but before
1864 IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
1868 * LUCENE-1933: Provide a convenience AttributeFactory that creates a
1869 Token instance for all basic attributes. (Uwe Schindler)
1871 * LUCENE-2041: Parallelize the rest of ParallelMultiSearcher. Lots of
1872 code refactoring and Java 5 concurrent support in MultiSearcher.
1873 (Joey Surls, Simon Willnauer via Uwe Schindler)
1875 * LUCENE-2051: Add CharArraySet.copy() as a simple method to copy
1876 any Set<?> to a CharArraySet that is optimized, if Set<?> is already
1877 an CharArraySet. (Simon Willnauer)
1881 * LUCENE-1183: Optimize Levenshtein Distance computation in
1882 FuzzyQuery. (Cédrik Lime via Mike McCandless)
1884 * LUCENE-2006: Optimization of FieldDocSortedHitQueue to always
1885 use Comparable<?> interface. (Uwe Schindler, Mark Miller)
1887 * LUCENE-2087: Remove recursion in NumericRangeTermEnum.
1892 * LUCENE-486: Remove test->demo dependencies. (Michael Busch)
1894 * LUCENE-2024: Raise build requirements to Java 1.5 and ANT 1.7.0
1895 (Uwe Schindler, Mike McCandless)
1897 ======================= Release 2.9.1 =======================
1899 Changes in backwards compatibility policy
1901 * LUCENE-2002: Add required Version matchVersion argument when
1902 constructing QueryParser or MultiFieldQueryParser and, default (as
1903 of 2.9) enablePositionIncrements to true to match
1904 StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless)
1908 * LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used
1909 BooleanScorer for scoring), whereby some matching documents fail to
1910 be collected. (Fulin Tang via Mike McCandless)
1912 * LUCENE-1124: Make sure FuzzyQuery always matches the precise term.
1913 (stefatwork@gmail.com via Mike McCandless)
1915 * LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing
1916 when the reader is a near real-time reader. (Jake Mannix via Mike
1919 * LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan,
1920 Mark Miller via Mike McCandless)
1922 * LUCENE-1992: Fix thread hazard if a merge is committing just as an
1923 exception occurs during sync (Uwe Schindler, Mike McCandless)
1925 * LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB
1926 cannot exceed 2048 MB, and throw IllegalArgumentException if it
1927 does. (Aaron McKee, Yonik Seeley, Mike McCandless)
1929 * LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined
1930 by client code. (Uwe Schindler)
1932 * LUCENE-2016: Replace illegal U+FFFF character with the replacement
1933 char (U+FFFD) during indexing, to prevent silent index corruption.
1934 (Peter Keegan, Mike McCandless)
1938 * Un-deprecate search(Weight weight, Filter filter, int n) from
1939 Searchable interface (deprecated by accident). (Uwe Schindler)
1941 * Un-deprecate o.a.l.util.Version constants. (Mike McCandless)
1943 * LUCENE-1987: Un-deprecate some ctors of Token, as they will not
1944 be removed in 3.0 and are still useful. Also add some missing
1945 o.a.l.util.Version constants for enabling invalid acronym
1946 settings in StandardAnalyzer to be compatible with the coming
1947 Lucene 3.0. (Uwe Schindler)
1949 * LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring,
1950 to allow controlling per-IndexSearcher whether scores are computed
1951 when sorting by field. (Uwe Schindler, Mike McCandless)
1953 * LUCENE-2043: Make IndexReader.commit(Map<String,String>) public.
1958 * LUCENE-1955: Fix Hits deprecation notice to point users in right
1959 direction. (Mike McCandless, Mark Miller)
1961 * Fix javadoc about score tracking done by search methods in Searcher
1962 and IndexSearcher. (Mike McCandless)
1964 * LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
1965 (Luke Nezda via Mike McCandless)
1967 ======================= Release 2.9.0 =======================
1969 Changes in backwards compatibility policy
1971 * LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
1972 longer computes a document score for each hit by default. If
1973 document score tracking is still needed, you can call
1974 IndexSearcher.setDefaultFieldSortScoring(true, true) to enable
1975 both per-hit and maxScore tracking; however, this is deprecated
1976 and will be removed in 3.0.
1978 Alternatively, use Searchable.search(Weight, Filter, Collector)
1979 and pass in a TopFieldCollector instance, using the following code
1983 TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields,
1984 true /* trackDocScores */,
1985 true /* trackMaxScore */,
1986 false /* docsInOrder */);
1987 searcher.search(query, tfc);
1988 TopDocs results = tfc.topDocs();
1991 Note that your Sort object cannot use SortField.AUTO when you
1992 directly instantiate TopFieldCollector.
1994 Also, the method search(Weight, Filter, Collector) was added to
1995 the Searchable interface and the Searcher abstract class to
1996 replace the deprecated HitCollector versions. If you either
1997 implement Searchable or extend Searcher, you should change your
1998 code to implement this method. If you already extend
1999 IndexSearcher, no further changes are needed to use Collector.
2001 Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not
2002 valid scores. Lucene uses these values internally in certain
2003 places, so if you have hits with such scores, it will cause
2004 problems. (Shai Erera via Mike McCandless)
2006 * LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache
2007 have been moved into FieldCache. ExtendedFieldCache is now deprecated and
2008 contains only a few declarations for binary backwards compatibility.
2009 ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and
2010 ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation.
2011 The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of
2012 ExtendedFieldCache and FieldCache, FieldCache can now additionally return
2013 long[] and double[] arrays in addition to int[] and float[] and StringIndex.
2015 The interface changes are only notable for users implementing the interfaces,
2016 which was unlikely done, because there is no possibility to change
2017 Lucene's FieldCache implementation. (Grant Ingersoll, Uwe Schindler)
2019 * LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
2020 class. Some of the method signatures have changed, but it should be fairly
2021 easy to see what adjustments must be made to existing code to sync up
2022 with the new API. You can find more detail in the API Changes section.
2024 Going forward Searchable will be kept for convenience only and may
2025 be changed between minor releases without any deprecation
2026 process. It is not recommended that you implement it, but rather extend
2028 (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
2030 * LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below)
2031 has some backwards breaks in rare cases. We did our best to make the
2032 transition as easy as possible and you are not likely to run into any problems.
2033 If your tokenizers still implement next(Token) or next(), the calls are
2034 automatically wrapped. The indexer and query parser use the new API
2035 (eg use incrementToken() calls). All core TokenStreams are implemented using
2036 the new API. You can mix old and new API style TokenFilters/TokenStream.
2037 Problems only occur when you have done the following:
2038 You have overridden next(Token) or next() in one of the non-abstract core
2039 TokenStreams/-Filters. These classes should normally be final, but some
2040 of them are not. In this case, next(Token)/next() would never be called.
2041 To fail early with a hard compile/runtime error, the next(Token)/next()
2042 methods in these TokenStreams/-Filters were made final in this release.
2043 (Michael Busch, Uwe Schindler)
2045 * LUCENE-1763: MergePolicy now requires an IndexWriter instance to
2046 be passed upon instantiation. As a result, IndexWriter was removed
2047 as a method argument from all MergePolicy methods. (Shai Erera via
2050 * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
2051 compat break and caused custom SpanQuery implementations to fail at runtime
2052 in a variety of ways. This issue attempts to remedy things by causing
2053 a compile time break on custom SpanQuery implementations and removing
2054 the PayloadSpans class, with its functionality now moved to Spans. To
2055 help in alleviating future back compat pain, Spans has been changed from
2056 an interface to an abstract class.
2057 (Hugh Cayless, Mark Miller)
2059 * LUCENE-1808: Query.createWeight has been changed from protected to
2060 public. This will be a back compat break if you have overridden this
2061 method - but you are likely already affected by the LUCENE-1693 (make Weight
2062 abstract rather than an interface) back compat break if you have overridden
2063 Query.creatWeight, so we have taken the opportunity to make this change.
2064 (Tim Smith, Shai Erera via Mark Miller)
2066 * LUCENE-1708 - IndexReader.document() no longer checks if the document is
2067 deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
2068 (Shai Erera via Mike McCandless)
2071 Changes in runtime behavior
2073 * LUCENE-1424: QueryParser now by default uses constant score auto
2074 rewriting when it generates a WildcardQuery and PrefixQuery (it
2075 already does so for TermRangeQuery, as well). Call
2076 setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
2077 to revert to slower BooleanQuery rewriting method. (Mark Miller via Mike
2080 * LUCENE-1575: As of 2.9, the core collectors as well as
2081 IndexSearcher's search methods that return top N results, no
2082 longer filter documents with scores <= 0.0. If you rely on this
2083 functionality you can use PositiveScoresOnlyCollector like this:
2086 TopDocsCollector tdc = new TopScoreDocCollector(10);
2087 Collector c = new PositiveScoresOnlyCollector(tdc);
2088 searcher.search(query, c);
2089 TopDocs hits = tdc.topDocs();
2093 * LUCENE-1604: IndexReader.norms(String field) is now allowed to
2094 return null if the field has no norms, as long as you've
2095 previously called IndexReader.setDisableFakeNorms(true). This
2096 setting now defaults to false (to preserve the fake norms back
2097 compatible behavior) but in 3.0 will be hardwired to true. (Shon
2098 Vella via Mike McCandless).
2100 * LUCENE-1624: If you open IndexWriter with create=true and
2101 autoCommit=false on an existing index, IndexWriter no longer
2102 writes an empty commit when it's created. (Paul Taylor via Mike
2105 * LUCENE-1593: When you call Sort() or Sort.setSort(String field,
2106 boolean reverse), the resulting SortField array no longer ends
2107 with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties
2108 internally by docID). (Shai Erera via Michael McCandless)
2110 * LUCENE-1542: When the first token(s) have 0 position increment,
2111 IndexWriter used to incorrectly record the position as -1, if no
2112 payload is present, or Integer.MAX_VALUE if a payload is present.
2113 This causes positional queries to fail to match. The bug is now
2114 fixed, but if your app relies on the buggy behavior then you must
2115 call IndexWriter.setAllowMinus1Position(). That API is deprecated
2116 so you must fix your application, and rebuild your index, to not
2117 rely on this behavior by the 3.0 release of Lucene. (Jonathan
2118 Mamou, Mark Miller via Mike McCandless)
2121 * LUCENE-1715: Finalizers have been removed from the 4 core classes
2122 that still had them, since they will cause GC to take longer, thus
2123 tying up memory for longer, and at best they mask buggy app code.
2124 DirectoryReader (returned from IndexReader.open) & IndexWriter
2125 previously released the write lock during finalize.
2126 SimpleFSDirectory.FSIndexInput closed the descriptor in its
2127 finalizer, and NativeFSLock released the lock. It's possible
2128 applications will be affected by this, but only if the application
2129 is failing to close reader/writers. (Brian Groose via Mike
2132 * LUCENE-1717: Fixed IndexWriter to account for RAM usage of
2133 buffered deletions. (Mike McCandless)
2135 * LUCENE-1727: Ensure that fields are stored & retrieved in the
2136 exact order in which they were added to the document. This was
2137 true in all Lucene releases before 2.3, but was broken in 2.3 and
2138 2.4, and is now fixed in 2.9. (Mike McCandless)
2140 * LUCENE-1678: The addition of Analyzer.reusableTokenStream
2141 accidentally broke back compatibility of external analyzers that
2142 subclassed core analyzers that implemented tokenStream but not
2143 reusableTokenStream. This is now fixed, such that if
2144 reusableTokenStream is invoked on such a subclass, that method
2145 will forcefully fallback to tokenStream. (Mike McCandless)
2147 * LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear
2148 startOffset, endOffset and type. This is not likely to affect any
2149 Tokenizer chains, as Tokenizers normally always set these three values.
2150 This change was made to be conform to the new AttributeImpl.clear() and
2151 AttributeSource.clearAttributes() to work identical for Token as one for all
2152 AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch)
2154 * LUCENE-1483: When searching over multiple segments, a new Scorer is now created
2155 for each segment. Searching has been telescoped out a level and IndexSearcher now
2156 operates much like MultiSearcher does. The Weight is created only once for the top
2157 level Searcher, but each Scorer is passed a per-segment IndexReader. This will
2158 result in doc ids in the Scorer being internal to the per-segment IndexReader. It
2159 has always been outside of the API to count on a given IndexReader to contain every
2160 doc id in the index - and if you have been ignoring MultiSearcher in your custom code
2161 and counting on this fact, you will find your code no longer works correctly. If a
2162 custom Scorer implementation uses any caches/filters that rely on being based on the
2163 top level IndexReader, it will need to be updated to correctly use contextless
2164 caches/filters eg you can't count on the IndexReader to contain any given doc id or
2165 all of the doc ids. (Mark Miller, Mike McCandless)
2167 * LUCENE-1846: DateTools now uses the US locale to format the numbers in its
2168 date/time strings instead of the default locale. For most locales there will
2169 be no change in the index format, as DateFormatSymbols is using ASCII digits.
2170 The usage of the US locale is important to guarantee correct ordering of
2171 generated terms. (Uwe Schindler)
2173 * LUCENE-1860: MultiTermQuery now defaults to
2174 CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it
2175 was SCORING_BOOLEAN_QUERY_REWRITE). This means that PrefixQuery
2176 and WildcardQuery will now produce constant score for all matching
2177 docs, equal to the boost of the query. (Mike McCandless)
2181 * LUCENE-1419: Add expert API to set custom indexing chain. This API is
2182 package-protected for now, so we don't have to officially support it.
2183 Yet, it will give us the possibility to try out different consumers
2184 in the chain. (Michael Busch)
2186 * LUCENE-1427: DocIdSet.iterator() is now allowed to throw
2187 IOException. (Paul Elschot, Mike McCandless)
2189 * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called
2190 AttributeSource instead of the Token class, which is now a utility class that
2191 holds common Token attributes. All attributes that the Token class had have
2192 been moved into separate classes: TermAttribute, OffsetAttribute,
2193 PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute.
2194 The new API is much more flexible; it allows to combine the Attributes
2195 arbitrarily and also to define custom Attributes. The new API has the same
2196 performance as the old next(Token) approach. For conformance with this new
2197 API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
2198 (Michael Busch, Uwe Schindler; additional contributions and bug fixes by
2199 Daniel Shane, Doron Cohen)
2201 * LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator.
2202 These methods can be used to avoid additional calls to doc().
2205 * LUCENE-1468: Deprecate Directory.list(), which sometimes (in
2206 FSDirectory) filters out files that don't look like index files, in
2207 favor of new Directory.listAll(), which does no filtering. Also,
2208 listAll() will never return null; instead, it throws an IOException
2209 (or subclass). Specifically, FSDirectory.listAll() will throw the
2210 newly added NoSuchDirectoryException if the directory does not
2211 exist. (Marcel Reutegger, Mike McCandless)
2213 * LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
2214 you to record an opaque commitUserData (maps String -> String) into
2215 the commit written by IndexReader. This matches IndexWriter's
2216 commit methods. (Jason Rutherglen via Mike McCandless)
2218 * LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
2219 enable compressing & decompressing binary content, external to
2220 Lucene's indexing. Deprecated Field.Store.COMPRESS.
2222 * LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions
2223 (Otis Gospodnetic via Mike McCandless)
2225 * LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods
2226 to denote issues when offsets in TokenStream tokens exceed the length of the
2227 provided text. (Mark Harwood)
2229 * LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of
2230 a new Collector abstract class. For easy migration, people can use
2231 HitCollectorWrapper which translates (wraps) HitCollector into
2232 Collector. Note that this class is also deprecated and will be
2233 removed when HitCollector is removed. Also TimeLimitedCollector
2234 is deprecated in favor of the new TimeLimitingCollector which
2235 extends Collector. (Shai Erera, Mark Miller, Mike McCandless)
2237 * LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because
2238 it is used nowhere in core/contrib and there is only a very ineffective
2239 default implementation available. If you want to position a TermEnum
2240 to another Term, create a new one using IndexReader.terms(Term).
2243 * LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does
2244 not make sense for all subclasses of MultiTermQuery. Check individual
2245 subclasses to see if they support getTerm(). (Mark Miller)
2247 * LUCENE-1636: Make TokenFilter.input final so it's set only
2248 once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
2250 * LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
2251 (but left an FSDirectory base class). Added an FSDirectory.open
2252 static method to pick a good default FSDirectory implementation
2253 given the OS. FSDirectories should now be instantiated using
2254 FSDirectory.open or with public constructors rather than
2255 FSDirectory.getDirectory(), which has been deprecated.
2256 (Michael McCandless, Uwe Schindler, yonik)
2258 * LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0.
2259 Instead, when sorting by field, the application should explicitly
2260 state the type of the field. (Mike McCandless)
2262 * LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now
2263 require up front specification of enablePositionIncrement (Mike
2266 * LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor
2267 of the new nextDoc() and advance(). The new methods return the doc Id they
2268 landed on, saving an extra call to doc() in most cases.
2269 For easy migration of the code, you can change the calls to next() to
2270 nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo().
2271 However it is advised that you take advantage of the returned doc ID and not
2272 call doc() following those two.
2273 Also, doc() was deprecated in favor of docID(). docID() should return -1 or
2274 NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the
2275 iterator has exhausted. Otherwise it should return the current doc ID.
2276 (Shai Erera via Mike McCandless)
2278 * LUCENE-1672: All ctors/opens and other methods using String/File to
2279 specify the directory in IndexReader, IndexWriter, and IndexSearcher
2280 were deprecated. You should instantiate the Directory manually before
2281 and pass it to these classes (LUCENE-1451, LUCENE-1658).
2284 * LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out
2285 of Lucene's core into new contrib/remote package. Searchable no
2286 longer extends java.rmi.Remote (Simon Willnauer via Mike
2289 * LUCENE-1677: The global property
2290 org.apache.lucene.SegmentReader.class, and
2291 ReadOnlySegmentReader.class are now deprecated, to be removed in
2292 3.0. src/gcj/* has been removed. (Earwin Burrfoot via Mike
2295 * LUCENE-1673: Deprecated NumberTools in favour of the new
2296 NumericRangeQuery and its new indexing format for numeric or
2297 date values. (Uwe Schindler)
2299 * LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds
2300 a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /*
2301 topScorer */) method instead of scorer(IndexReader). IndexSearcher uses
2302 this method to obtain a scorer matching the capabilities of the Collector
2303 wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more
2304 efficient if out-of-order documents scoring is allowed by a Collector.
2305 Collector must now implement acceptsDocsOutOfOrder. If you write a
2306 Collector which does not care about doc ID orderness, it is recommended
2307 that you return true. Weight has a scoresDocsOutOfOrder method, which by
2308 default returns false. If you create a Weight which will score documents
2309 out of order if requested, you should override that method to return true.
2310 BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been
2311 deprecated as they are not needed anymore. BooleanQuery will now score docs
2312 out of order when used with a Collector that can accept docs out of order.
2313 Finally, Weight#explain now takes a sub-reader and sub-docID, rather than
2314 a top level reader and docID.
2315 (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
2317 * LUCENE-1466, LUCENE-1906: Added CharFilter and MappingCharFilter, which allows
2318 chaining & mapping of characters before tokenizers run. CharStream (subclass of
2319 Reader) is the base class for custom java.io.Reader's, that support offset
2320 correction. Tokenizers got an additional method correctOffset() that is passed
2321 down to the underlying CharStream if input is a subclass of CharStream/-Filter.
2322 (Koji Sekiguchi via Mike McCandless, Uwe Schindler)
2324 * LUCENE-1703: Add IndexWriter.waitForMerges. (Tim Smith via Mike
2327 * LUCENE-1625: CheckIndex's programmatic API now returns separate
2328 classes detailing the status of each component in the index, and
2329 includes more detailed status than previously. (Tim Smith via
2332 * LUCENE-1713: Deprecated RangeQuery and RangeFilter and renamed to
2333 TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant
2334 score auto rewrite mode by default. The new classes also have new
2335 ctors taking field and term ranges as Strings (see also
2336 LUCENE-1424). (Uwe Schindler)
2338 * LUCENE-1609: The termInfosIndexDivisor must now be specified
2339 up-front when opening the IndexReader. Attempts to call
2340 IndexReader.setTermInfosIndexDivisor will hit an
2341 UnsupportedOperationException. This was done to enable removal of
2342 all synchronization in TermInfosReader, which previously could
2343 cause threads to pile up in certain cases. (Dan Rosher via Mike
2346 * LUCENE-1688: Deprecate static final String stop word array in and
2347 StopAnalzyer and replace it with an immutable implementation of
2348 CharArraySet. (Simon Willnauer via Mark Miller)
2350 * LUCENE-1742: SegmentInfos, SegmentInfo and SegmentReader have been
2351 made public as expert, experimental APIs. These APIs may suddenly
2352 change from release to release (Jason Rutherglen via Mike
2355 * LUCENE-1754: QueryWeight.scorer() can return null if no documents
2356 are going to be matched by the query. Similarly,
2357 Filter.getDocIdSet() can return null if no documents are going to
2358 be accepted by the Filter. Note that these 'can' return null,
2359 however they don't have to and can return a Scorer/DocIdSet which
2360 does not match / reject all documents. This is already the
2361 behavior of some QueryWeight/Filter implementations, and is
2362 documented here just for emphasis. (Shai Erera via Mike
2365 * LUCENE-1705: Added IndexWriter.deleteAllDocuments. (Tim Smith via
2368 * LUCENE-1460: Changed TokenStreams/TokenFilters in contrib to
2369 use the new TokenStream API. (Robert Muir, Michael Busch)
2371 * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
2372 compat break and caused custom SpanQuery implementations to fail at runtime
2373 in a variety of ways. This issue attempts to remedy things by causing
2374 a compile time break on custom SpanQuery implementations and removing
2375 the PayloadSpans class, with its functionality now moved to Spans. To
2376 help in alleviating future back compat pain, Spans has been changed from
2377 an interface to an abstract class.
2378 (Hugh Cayless, Mark Miller)
2380 * LUCENE-1808: Query.createWeight has been changed from protected to
2381 public. (Tim Smith, Shai Erera via Mark Miller)
2383 * LUCENE-1826: Add constructors that take AttributeSource and
2384 AttributeFactory to all Tokenizer implementations.
2387 * LUCENE-1847: Similarity#idf for both a Term and Term Collection have
2388 been deprecated. New versions that return an IDFExplanation have been
2389 added. (Yasoja Seneviratne, Mike McCandless, Mark Miller)
2391 * LUCENE-1877: Made NativeFSLockFactory the default for
2392 the new FSDirectory API (open(), FSDirectory subclass ctors).
2393 All FSDirectory system properties were deprecated and all lock
2394 implementations use no lock prefix if the locks are stored inside
2395 the index directory. Because the deprecated String/File ctors of
2396 IndexWriter and IndexReader (LUCENE-1672) and FSDirectory.getDirectory()
2397 still use the old SimpleFSLockFactory and the new API
2398 NativeFSLockFactory, we strongly recommend not to mix deprecated
2399 and new API. (Uwe Schindler, Mike McCandless)
2401 * LUCENE-1911: Added a new method isCacheable() to DocIdSet. This method
2402 should return true, if the underlying implementation does not use disk
2403 I/O and is fast enough to be directly cached by CachingWrapperFilter.
2404 OpenBitSet, SortedVIntList, and DocIdBitSet are such candidates.
2405 The default implementation of the abstract DocIdSet class returns false.
2406 In this case, CachingWrapperFilter copies the DocIdSetIterator into an
2407 OpenBitSet for caching. (Uwe Schindler, Thomas Becker)
2411 * LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
2412 implementation - Leads to Solr Cache misses.
2413 (Todd Feak, Mark Miller via yonik)
2415 * LUCENE-1327: Fix TermSpans#skipTo() to behave as specified in javadocs
2416 of Terms#skipTo(). (Michael Busch)
2418 * LUCENE-1573: Do not ignore InterruptedException (caused by
2419 Thread.interrupt()) nor enter deadlock/spin loop. Now, an interrupt
2420 will cause a RuntimeException to be thrown. In 3.0 we will change
2421 public APIs to throw InterruptedException. (Jeremy Volkman via
2424 * LUCENE-1590: Fixed stored-only Field instances do not change the
2425 value of omitNorms, omitTermFreqAndPositions in FieldInfo; when you
2426 retrieve such fields they will now have omitNorms=true and
2427 omitTermFreqAndPositions=false (though these values are unused).
2428 (Uwe Schindler via Mike McCandless)
2430 * LUCENE-1587: RangeQuery#equals() could consider a RangeQuery
2431 without a collator equal to one with a collator.
2432 (Mark Platvoet via Mark Miller)
2434 * LUCENE-1600: Don't call String.intern unnecessarily in some cases
2435 when loading documents from the index. (P Eger via Mike
2438 * LUCENE-1611: Fix case where OutOfMemoryException in IndexWriter
2439 could cause "infinite merging" to happen. (Christiaan Fluit via
2442 * LUCENE-1623: Properly handle back-compatibility of 2.3.x indexes that
2443 contain field names with non-ascii characters. (Mike Streeton via
2446 * LUCENE-1593: MultiSearcher and ParallelMultiSearcher did not break ties (in
2447 sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs.
2448 when it wasn't). (Shai Erera via Michael McCandless)
2450 * LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
2451 the segment's deletion count to be incorrect. (Mike McCandless)
2453 * LUCENE-1542: When the first token(s) have 0 position increment,
2454 IndexWriter used to incorrectly record the position as -1, if no
2455 payload is present, or Integer.MAX_VALUE if a payload is present.
2456 This causes positional queries to fail to match. The bug is now
2457 fixed, but if your app relies on the buggy behavior then you must
2458 call IndexWriter.setAllowMinus1Position(). That API is deprecated
2459 so you must fix your application, and rebuild your index, to not
2460 rely on this behavior by the 3.0 release of Lucene. (Jonathan
2461 Mamou, Mark Miller via Mike McCandless)
2463 * LUCENE-1658: Fixed MMapDirectory to correctly throw IOExceptions
2464 on EOF, removed numeric overflow possibilities and added support
2465 for a hack to unmap the buffers on closing IndexInput.
2468 * LUCENE-1681: Fix infinite loop caused by a call to DocValues methods
2469 getMinValue, getMaxValue, getAverageValue. (Simon Willnauer via Mark Miller)
2471 * LUCENE-1599: Add clone support for SpanQuerys. SpanRegexQuery counts
2472 on this functionality and does not work correctly without it.
2473 (Billow Gao, Mark Miller)
2475 * LUCENE-1718: Fix termInfosIndexDivisor to carry over to reopened
2476 readers (Mike McCandless)
2478 * LUCENE-1583: SpanOrQuery skipTo() doesn't always move forwards as Spans
2479 documentation indicates it should. (Moti Nisenson via Mark Miller)
2481 * LUCENE-1566: Sun JVM Bug
2482 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 causes
2483 invalid OutOfMemoryError when reading too many bytes at once from
2484 a file on 32bit JVMs that have a large maximum heap size. This
2485 fix adds set/getReadChunkSize to FSDirectory so that large reads
2486 are broken into chunks, to work around this JVM bug. On 32bit
2487 JVMs the default chunk size is 100 MB; on 64bit JVMs, which don't
2488 show the bug, the default is Integer.MAX_VALUE. (Simon Willnauer
2489 via Mike McCandless)
2491 * LUCENE-1448: Added TokenStream.end() to perform end-of-stream
2492 operations (ie to return the end offset of the tokenization).
2493 This is important when multiple fields with the same name are added
2494 to a document, to ensure offsets recorded in term vectors for all
2495 of the instances are correct.
2496 (Mike McCandless, Mark Miller, Michael Busch)
2498 * LUCENE-1805: CloseableThreadLocal did not allow a null Object in get(),
2499 although it does allow it in set(Object). Fix get() to not assert the object
2500 is not null. (Shai Erera via Mike McCandless)
2502 * LUCENE-1801: Changed all Tokenizers or TokenStreams in core/contrib)
2503 that are the source of Tokens to always call
2504 AttributeSource.clearAttributes() first. (Uwe Schindler)
2506 * LUCENE-1819: MatchAllDocsQuery.toString(field) should produce output
2507 that is parsable by the QueryParser. (John Wang, Mark Miller)
2509 * LUCENE-1836: Fix localization bug in the new query parser and add
2510 new LocalizedTestCase as base class for localization junit tests.
2511 (Robert Muir, Uwe Schindler via Michael Busch)
2513 * LUCENE-1847: PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats
2514 in their Weight#explain methods - these stats should be corpus wide.
2515 (Yasoja Seneviratne, Mike McCandless, Mark Miller)
2517 * LUCENE-1885: Fix the bug that NativeFSLock.isLocked() did not work,
2518 if the lock was obtained by another NativeFSLock(Factory) instance.
2519 Because of this IndexReader.isLocked() and IndexWriter.isLocked() did
2520 not work correctly. (Uwe Schindler)
2522 * LUCENE-1899: Fix O(N^2) CPU cost when setting docIDs in order in an
2523 OpenBitSet, due to an inefficiency in how the underlying storage is
2524 reallocated. (Nadav Har'El via Mike McCandless)
2526 * LUCENE-1918: Fixed cases where a ParallelReader would
2527 generate exceptions on being passed to
2528 IndexWriter.addIndexes(IndexReader[]). First case was when the
2529 ParallelReader was empty. Second case was when the ParallelReader
2530 used to contain documents with TermVectors, but all such documents
2531 have been deleted. (Christian Kohlschütter via Mike McCandless)
2535 * LUCENE-1411: Added expert API to open an IndexWriter on a prior
2536 commit, obtained from IndexReader.listCommits. This makes it
2537 possible to rollback changes to an index even after you've closed
2538 the IndexWriter that made the changes, assuming you are using an
2539 IndexDeletionPolicy that keeps past commits around. This is useful
2540 when building transactional support on top of Lucene. (Mike
2543 * LUCENE-1382: Add an optional arbitrary Map (String -> String)
2544 "commitUserData" to IndexWriter.commit(), which is stored in the
2545 segments file and is then retrievable via
2546 IndexReader.getCommitUserData instance and static methods.
2547 (Shalin Shekhar Mangar via Mike McCandless)
2549 * LUCENE-1420: Similarity now has a computeNorm method that allows
2550 custom Similarity classes to override how norm is computed. It's
2551 provided a FieldInvertState instance that contains details from
2552 inverting the field. The default impl is boost *
2553 lengthNorm(numTerms), to be backwards compatible. Also added
2554 {set/get}DiscountOverlaps to DefaultSimilarity, to control whether
2555 overlapping tokens (tokens with 0 position increment) should be
2556 counted in lengthNorm. (Andrzej Bialecki via Mike McCandless)
2558 * LUCENE-1424: Moved constant score query rewrite capability into
2559 MultiTermQuery, allowing TermRangeQuery, PrefixQuery and WildcardQuery
2560 to switch between constant-score rewriting or BooleanQuery
2561 expansion rewriting via a new setRewriteMethod method.
2562 Deprecated ConstantScoreRangeQuery (Mark Miller via Mike
2565 * LUCENE-1461: Added FieldCacheRangeFilter, a RangeFilter for
2566 single-term fields that uses FieldCache to compute the filter. If
2567 your documents all have a single term for a given field, and you
2568 need to create many RangeFilters with varying lower/upper bounds,
2569 then this is likely a much faster way to create the filters than
2570 RangeFilter. FieldCacheRangeFilter allows ranges on all data types,
2571 FieldCache supports (term ranges, byte, short, int, long, float, double).
2572 However, it comes at the expense of added RAM consumption and slower
2573 first-time usage due to populating the FieldCache. It also does not
2574 support collation (Tim Sturge, Matt Ericson via Mike McCandless and
2577 * LUCENE-1296: add protected method CachingWrapperFilter.docIdSetToCache
2578 to allow subclasses to choose which DocIdSet implementation to use
2579 (Paul Elschot via Mike McCandless)
2581 * LUCENE-1390: Added ASCIIFoldingFilter, a Filter that converts
2582 alphabetic, numeric, and symbolic Unicode characters which are not in
2583 the first 127 ASCII characters (the "Basic Latin" Unicode block) into
2584 their ASCII equivalents, if one exists. ISOLatin1AccentFilter, which
2585 handles a subset of this filter, has been deprecated.
2586 (Andi Vajda, Steven Rowe via Mark Miller)
2588 * LUCENE-1478: Added new SortField constructor allowing you to
2589 specify a custom FieldCache parser to generate numeric values from
2590 terms for a field. (Uwe Schindler via Mike McCandless)
2592 * LUCENE-1528: Add support for Ideographic Space to the queryparser.
2593 (Luis Alves via Michael Busch)
2595 * LUCENE-1487: Added FieldCacheTermsFilter, to filter by multiple
2596 terms on single-valued fields. The filter loads the FieldCache
2597 for the field the first time it's called, and subsequent usage of
2598 that field, even with different Terms in the filter, are fast.
2599 (Tim Sturge, Shalin Shekhar Mangar via Mike McCandless).
2601 * LUCENE-1314: Add clone(), clone(boolean readOnly) and
2602 reopen(boolean readOnly) to IndexReader. Cloning an IndexReader
2603 gives you a new reader which you can make changes to (deletions,
2604 norms) without affecting the original reader. Now, with clone or
2605 reopen you can change the readOnly of the original reader. (Jason
2606 Rutherglen, Mike McCandless)
2608 * LUCENE-1506: Added FilteredDocIdSet, an abstract class which you
2609 subclass to implement the "match" method to accept or reject each
2610 docID. Unlike ChainedFilter (under contrib/misc),
2611 FilteredDocIdSet never requires you to materialize the full
2612 bitset. Instead, match() is called on demand per docID. (John
2613 Wang via Mike McCandless)
2615 * LUCENE-1398: Add ReverseStringFilter to contrib/analyzers, a filter
2616 to reverse the characters in each token. (Koji Sekiguchi via yonik)
2618 * LUCENE-1551: Add expert IndexReader.reopen(IndexCommit) to allow
2619 efficiently opening a new reader on a specific commit, sharing
2620 resources with the original reader. (Torin Danil via Mike
2623 * LUCENE-1434: Added org.apache.lucene.util.IndexableBinaryStringTools,
2624 to encode byte[] as String values that are valid terms, and
2625 maintain sort order of the original byte[] when the bytes are
2626 interpreted as unsigned. (Steven Rowe via Mike McCandless)
2628 * LUCENE-1543: Allow MatchAllDocsQuery to optionally use norms from
2629 a specific fields to set the score for a document. (Karl Wettin
2630 via Mike McCandless)
2632 * LUCENE-1586: Add IndexReader.getUniqueTermCount(). (Mike
2633 McCandless via Derek)
2635 * LUCENE-1516: Added "near real-time search" to IndexWriter, via a
2636 new expert getReader() method. This method returns a reader that
2637 searches the full index, including any uncommitted changes in the
2638 current IndexWriter session. This should result in a faster
2639 turnaround than the normal approach of commiting the changes and
2640 then reopening a reader. (Jason Rutherglen via Mike McCandless)
2642 * LUCENE-1603: Added new MultiTermQueryWrapperFilter, to wrap any
2643 MultiTermQuery as a Filter. Also made some improvements to
2644 MultiTermQuery: return DocIdSet.EMPTY_DOCIDSET if there are no
2645 terms in the enum; track the total number of terms it visited
2646 during rewrite (getTotalNumberOfTerms). FilteredTermEnum is also
2647 more friendly to subclassing. (Uwe Schindler via Mike McCandless)
2649 * LUCENE-1605: Added BitVector.subset(). (Jeremy Volkman via Mike
2652 * LUCENE-1618: Added FileSwitchDirectory that enables files with
2653 specified extensions to be stored in a primary directory and the
2654 rest of the files to be stored in the secondary directory. For
2655 example, this can be useful for the large doc-store (stored
2656 fields, term vectors) files in FSDirectory and the rest of the
2657 index files in a RAMDirectory. (Jason Rutherglen via Mike
2660 * LUCENE-1494: Added FieldMaskingSpanQuery which can be used to
2661 cross-correlate Spans from different fields.
2662 (Paul Cowan and Chris Hostetter)
2664 * LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
2665 deletions into account when considering merges. (Yasuhiro Matsuda
2666 via Mike McCandless)
2668 * LUCENE-1550: Added new n-gram based String distance measure for spell checking.
2669 See the Javadocs for NGramDistance.java for a reference paper on why
2670 this is helpful (Tom Morton via Grant Ingersoll)
2672 * LUCENE-1470, LUCENE-1582, LUCENE-1602, LUCENE-1673, LUCENE-1701, LUCENE-1712:
2673 Added NumericRangeQuery and NumericRangeFilter, a fast alternative to
2674 RangeQuery/RangeFilter for numeric searches. They depend on a specific
2675 structure of terms in the index that can be created by indexing
2676 using the new NumericField or NumericTokenStream classes. NumericField
2677 can only be used for indexing and optionally stores the values as
2678 string representation in the doc store. Documents returned from
2679 IndexReader/IndexSearcher will return only the String value using
2680 the standard Fieldable interface. NumericFields can be sorted on
2681 and loaded into the FieldCache. (Uwe Schindler, Yonik Seeley,
2684 * LUCENE-1405: Added support for Ant resource collections in contrib/ant
2685 <index> task. (Przemyslaw Sztoch via Erik Hatcher)
2687 * LUCENE-1699: Allow setting a TokenStream on Field/Fieldable for indexing
2688 in conjunction with any other ways to specify stored field values,
2689 currently binary or string values. (yonik)
2691 * LUCENE-1701: Made the standard FieldCache.Parsers public and added
2692 parsers for fields generated using NumericField/NumericTokenStream.
2693 All standard parsers now also implement Serializable and enforce
2694 their singleton status. (Uwe Schindler, Mike McCandless)
2696 * LUCENE-1741: User configurable maximum chunk size in MMapDirectory.
2697 On 32 bit platforms, the address space can be very fragmented, so
2698 one big ByteBuffer for the whole file may not fit into address space.
2699 (Eks Dev via Uwe Schindler)
2701 * LUCENE-1644: Enable 4 rewrite modes for queries deriving from
2702 MultiTermQuery (WildcardQuery, PrefixQuery, TermRangeQuery,
2703 NumericRangeQuery): CONSTANT_SCORE_FILTER_REWRITE first creates a
2704 filter and then assigns constant score (boost) to docs;
2705 CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE create a BooleanQuery but
2706 uses a constant score (boost); SCORING_BOOLEAN_QUERY_REWRITE also
2707 creates a BooleanQuery but keeps the BooleanQuery's scores;
2708 CONSTANT_SCORE_AUTO_REWRITE tries to pick the most performant
2709 constant-score rewrite method. (Mike McCandless)
2711 * LUCENE-1448: Added TokenStream.end(), to perform end-of-stream
2712 operations. This is currently used to fix offset problems when
2713 multiple fields with the same name are added to a document.
2714 (Mike McCandless, Mark Miller, Michael Busch)
2716 * LUCENE-1776: Add an option to not collect payloads for an ordered
2717 SpanNearQuery. Payloads were not lazily loaded in this case as
2718 the javadocs implied. If you have payloads and want to use an ordered
2719 SpanNearQuery that does not need to use the payloads, you can
2720 disable loading them with a new constructor switch. (Mark Miller)
2722 * LUCENE-1341: Added PayloadNearQuery to enable SpanNearQuery functionality
2723 with payloads (Peter Keegan, Grant Ingersoll, Mark Miller)
2725 * LUCENE-1790: Added PayloadTermQuery to enable scoring of payloads
2726 based on the maximum payload seen for a document.
2727 Slight refactoring of Similarity and other payload queries (Grant Ingersoll, Mark Miller)
2729 * LUCENE-1749: Addition of FieldCacheSanityChecker utility, and
2730 hooks to use it in all existing Lucene Tests. This class can
2731 be used by any application to inspect the FieldCache and provide
2732 diagnostic information about the possibility of inconsistent
2733 FieldCache usage. Namely: FieldCache entries for the same field
2734 with different datatypes or parsers; and FieldCache entries for
2735 the same field in both a reader, and one of it's (descendant) sub
2737 (Chris Hostetter, Mark Miller)
2739 * LUCENE-1789: Added utility class
2740 oal.search.function.MultiValueSource to ease the transition to
2741 segment based searching for any apps that directly call
2742 oal.search.function.* APIs. This class wraps any other
2743 ValueSource, but takes care when composite (multi-segment) are
2744 passed to not double RAM usage in the FieldCache. (Chris
2745 Hostetter, Mark Miller, Mike McCandless)
2749 * LUCENE-1427: Fixed QueryWrapperFilter to not waste time computing
2750 scores of the query, since they are just discarded. Also, made it
2751 more efficient (single pass) by not creating & populating an
2752 intermediate OpenBitSet (Paul Elschot, Mike McCandless)
2754 * LUCENE-1443: Performance improvement for OpenBitSetDISI.inPlaceAnd()
2755 (Paul Elschot via yonik)
2757 * LUCENE-1484: Remove synchronization of IndexReader.document() by
2758 using CloseableThreadLocal internally. (Jason Rutherglen via Mike
2761 * LUCENE-1124: Short circuit FuzzyQuery.rewrite when input token length
2762 is small compared to minSimilarity. (Timo Nentwig, Mark Miller)
2764 * LUCENE-1316: MatchAllDocsQuery now avoids the synchronized
2765 IndexReader.isDeleted() call per document, by directly accessing
2766 the underlying deleteDocs BitVector. This improves performance
2767 with non-readOnly readers, especially in a multi-threaded
2768 environment. (Todd Feak, Yonik Seeley, Jason Rutherglen via Mike
2771 * LUCENE-1483: When searching over multiple segments we now visit
2772 each sub-reader one at a time. This speeds up warming, since
2773 FieldCache entries (if required) can be shared across reopens for
2774 those segments that did not change, and also speeds up searches
2775 that sort by relevance or by field values. (Mark Miller, Mike
2778 * LUCENE-1575: The new Collector class decouples collect() from
2779 score computation. Collector.setScorer is called to establish the
2780 current Scorer in-use per segment. Collectors that require the
2781 score should then call Scorer.score() per hit inside
2782 collect(). (Shai Erera via Mike McCandless)
2784 * LUCENE-1596: MultiTermDocs speedup when set with
2785 MultiTermDocs.seek(MultiTermEnum) (yonik)
2787 * LUCENE-1653: Avoid creating a Calendar in every call to
2788 DateTools#dateToString, DateTools#timeToString and
2789 DateTools#round. (Shai Erera via Mark Miller)
2791 * LUCENE-1688: Deprecate static final String stop word array and
2792 replace it with an immutable implementation of CharArraySet.
2793 Removes conversions between Set and array.
2794 (Simon Willnauer via Mark Miller)
2796 * LUCENE-1754: BooleanQuery.queryWeight.scorer() will return null if
2797 it won't match any documents (e.g. if there are no required and
2798 optional scorers, or not enough optional scorers to satisfy
2799 minShouldMatch). (Shai Erera via Mike McCandless)
2801 * LUCENE-1607: To speed up string interning for commonly used
2802 strings, the StringHelper.intern() interface was added with a
2803 default implementation that uses a lockless cache.
2804 (Earwin Burrfoot, yonik)
2806 * LUCENE-1800: QueryParser should use reusable TokenStreams. (yonik)
2811 * LUCENE-1908: Scoring documentation imrovements in Similarity javadocs.
2812 (Mark Miller, Shai Erera, Ted Dunning, Jiri Kuhn, Marvin Humphrey, Doron Cohen)
2814 * LUCENE-1872: NumericField javadoc improvements
2815 (Michael McCandless, Uwe Schindler)
2817 * LUCENE-1875: Make TokenStream.end javadoc less confusing.
2820 * LUCENE-1862: Rectified duplicate package level javadocs for
2821 o.a.l.queryParser and o.a.l.analysis.cn.
2824 * LUCENE-1886: Improved hyperlinking in key Analysis javadocs
2825 (Bernd Fondermann via Chris Hostetter)
2827 * LUCENE-1884: massive javadoc and comment cleanup, primarily dealing with
2829 (Robert Muir via Chris Hostetter)
2831 * LUCENE-1898: Switch changes to use bullets rather than numbers and
2832 update changes-to-html script to handle the new format.
2833 (Steven Rowe, Mark Miller)
2835 * LUCENE-1900: Improve Searchable Javadoc.
2836 (Nadav Har'El, Doron Cohen, Marvin Humphrey, Mark Miller)
2838 * LUCENE-1896: Improve Similarity#queryNorm javadocs.
2839 (Jiri Kuhn, Mark Miller)
2843 * LUCENE-1440: Add new targets to build.xml that allow downloading
2844 and executing the junit testcases from an older release for
2845 backwards-compatibility testing. (Michael Busch)
2847 * LUCENE-1446: Add compatibility tag to common-build.xml and run
2848 backwards-compatibility tests in the nightly build. (Michael Busch)
2850 * LUCENE-1529: Properly test "drop-in" replacement of jar with
2851 backwards-compatibility tests. (Mike McCandless, Michael Busch)
2853 * LUCENE-1851: Change 'javacc' and 'clean-javacc' targets to build
2854 and clean contrib/surround files. (Luis Alves via Michael Busch)
2856 * LUCENE-1854: tar task should use longfile="gnu" to avoid false file
2857 name length warnings. (Mark Miller)
2861 * LUCENE-1791: Enhancements to the QueryUtils and CheckHits utility
2862 classes to wrap IndexReaders and Searchers in MultiReaders or
2863 MultiSearcher when possible to help exercise more edge cases.
2864 (Chris Hostetter, Mark Miller)
2866 * LUCENE-1852: Fix localization test failures.
2867 (Robert Muir via Michael Busch)
2869 * LUCENE-1843: Refactored all tests that use assertAnalyzesTo() & others
2870 in core and contrib to use a new BaseTokenStreamTestCase
2871 base class. Also rewrote some tests to use this general analysis assert
2872 functions instead of own ones (e.g. TestMappingCharFilter).
2873 The new base class also tests tokenization with the TokenStream.next()
2874 backwards layer enabled (using Token/TokenWrapper as attribute
2875 implementation) and disabled (default for Lucene 3.0)
2876 (Uwe Schindler, Robert Muir)
2878 * LUCENE-1836: Added a new LocalizedTestCase as base class for localization
2879 junit tests. (Robert Muir, Uwe Schindler via Michael Busch)
2881 ======================= Release 2.4.1 =======================
2885 1. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
2886 resources. (Christian Kohlschütter via Mike McCandless)
2890 1. LUCENE-1452: Fixed silent data-loss case whereby binary fields are
2891 truncated to 0 bytes during merging if the segments being merged
2892 are non-congruent (same field name maps to different field
2893 numbers). This bug was introduced with LUCENE-1219. (Andrzej
2894 Bialecki via Mike McCandless).
2896 2. LUCENE-1429: Don't throw incorrect IllegalStateException from
2897 IndexWriter.close() if you've hit an OOM when autoCommit is true.
2900 3. LUCENE-1474: If IndexReader.flush() is called twice when there were
2901 pending deletions, it could lead to later false AssertionError
2902 during IndexReader.open. (Mike McCandless)
2904 4. LUCENE-1430: Fix false AlreadyClosedException from IndexReader.open
2905 (masking an actual IOException) that takes String or File path.
2908 5. LUCENE-1442: Multiple-valued NOT_ANALYZED fields can double-count
2909 token offsets. (Mike McCandless)
2911 6. LUCENE-1453: Ensure IndexReader.reopen()/clone() does not result in
2912 incorrectly closing the shared FSDirectory. This bug would only
2913 happen if you use IndexReader.open() with a File or String argument.
2914 The returned readers are wrapped by a FilterIndexReader that
2915 correctly handles closing of directory after reopen()/clone().
2916 (Mark Miller, Uwe Schindler, Mike McCandless)
2918 7. LUCENE-1457: Fix possible overflow bugs during binary
2919 searches. (Mark Miller via Mike McCandless)
2921 8. LUCENE-1459: Fix CachingWrapperFilter to not throw exception if
2922 both bits() and getDocIdSet() methods are called. (Matt Jones via
2925 9. LUCENE-1519: Fix int overflow bug during segment merging. (Deepak
2926 via Mike McCandless)
2928 10. LUCENE-1521: Fix int overflow bug when flushing segment.
2929 (Shon Vella via Mike McCandless).
2931 11. LUCENE-1544: Fix deadlock in IndexWriter.addIndexes(IndexReader[]).
2932 (Mike McCandless via Doug Sale)
2934 12. LUCENE-1547: Fix rare thread safety issue if two threads call
2935 IndexWriter commit() at the same time. (Mike McCandless)
2937 13. LUCENE-1465: NearSpansOrdered returns payloads from first possible match
2938 rather than the correct, shortest match; Payloads could be returned even
2939 if the max slop was exceeded; The wrong payload could be returned in
2940 certain situations. (Jonathan Mamou, Greg Shackles, Mark Miller)
2942 14. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
2943 resources. (Christian Kohlschütter via Mike McCandless)
2945 15. LUCENE-1552: Fix IndexWriter.addIndexes(IndexReader[]) to properly
2946 rollback IndexWriter's internal state on hitting an
2947 exception. (Scott Garland via Mike McCandless)
2949 ======================= Release 2.4.0 =======================
2951 Changes in backwards compatibility policy
2953 1. LUCENE-1340: In a minor change to Lucene's backward compatibility
2954 policy, we are now allowing the Fieldable interface to have
2955 changes, within reason, and made on a case-by-case basis. If an
2956 application implements it's own Fieldable, please be aware of
2957 this. Otherwise, no need to be concerned. This is in effect for
2958 all 2.X releases, starting with 2.4. Also note, that in all
2959 likelihood, Fieldable will be changed in 3.0.
2962 Changes in runtime behavior
2964 1. LUCENE-1151: Fix StandardAnalyzer to not mis-identify host names
2965 (eg lucene.apache.org) as an ACRONYM. To get back to the pre-2.4
2966 backwards compatible, but buggy, behavior, you can either call
2967 StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static
2968 method), or, set system property
2969 org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
2970 to "false" on JVM startup. All StandardAnalyzer instances created
2971 after that will then show the pre-2.4 behavior. Alternatively,
2972 you can call setReplaceInvalidAcronym(false) to change the
2973 behavior per instance of StandardAnalyzer. This backwards
2974 compatibility will be removed in 3.0 (hardwiring the value to
2975 true). (Mike McCandless)
2977 2. LUCENE-1044: IndexWriter with autoCommit=true now commits (such
2978 that a reader can see the changes) far less often than it used to.
2979 Previously, every flush was also a commit. You can always force a
2980 commit by calling IndexWriter.commit(). Furthermore, in 3.0,
2981 autoCommit will be hardwired to false (IndexWriter constructors
2982 that take an autoCommit argument have been deprecated) (Mike
2985 3. LUCENE-1335: IndexWriter.addIndexes(Directory[]) and
2986 addIndexesNoOptimize no longer allow the same Directory instance
2987 to be passed in more than once. Internally, IndexWriter uses
2988 Directory and segment name to uniquely identify segments, so
2989 adding the same Directory more than once was causing duplicates
2990 which led to problems (Mike McCandless)
2992 4. LUCENE-1396: Improve PhraseQuery.toString() so that gaps in the
2993 positions are indicated with a ? and multiple terms at the same
2994 position are joined with a |. (Andrzej Bialecki via Mike
2999 1. LUCENE-1084: Changed all IndexWriter constructors to take an
3000 explicit parameter for maximum field size. Deprecated all the
3001 pre-existing constructors; these will be removed in release 3.0.
3002 NOTE: these new constructors set autoCommit to false. (Steven
3003 Rowe via Mike McCandless)
3005 2. LUCENE-584: Changed Filter API to return a DocIdSet instead of a
3006 java.util.BitSet. This allows using more efficient data structures
3007 for Filters and makes them more flexible. This deprecates
3008 Filter.bits(), so all filters that implement this outside
3009 the Lucene code base will need to be adapted. See also the javadocs
3010 of the Filter class. (Paul Elschot, Michael Busch)
3012 3. LUCENE-1044: Added IndexWriter.commit() which flushes any buffered
3013 adds/deletes and then commits a new segments file so readers will
3014 see the changes. Deprecate IndexWriter.flush() in favor of
3015 IndexWriter.commit(). (Mike McCandless)
3017 4. LUCENE-325: Added IndexWriter.expungeDeletes methods, which
3018 consult the MergePolicy to find merges necessary to merge away all
3019 deletes from the index. This should be a somewhat lower cost
3020 operation than optimize. (John Wang via Mike McCandless)
3022 5. LUCENE-1233: Return empty array instead of null when no fields
3023 match the specified name in these methods in Document:
3024 getFieldables, getFields, getValues, getBinaryValues. (Stefan
3025 Trcek vai Mike McCandless)
3027 6. LUCENE-1234: Make BoostingSpanScorer protected. (Andi Vajda via Grant Ingersoll)
3029 7. LUCENE-510: The index now stores strings as true UTF-8 bytes
3030 (previously it was Java's modified UTF-8). If any text, either
3031 stored fields or a token, has illegal UTF-16 surrogate characters,
3032 these characters are now silently replaced with the Unicode
3033 replacement character U+FFFD. This is a change to the index file
3034 format. (Marvin Humphrey via Mike McCandless)
3036 8. LUCENE-852: Let the SpellChecker caller specify IndexWriter mergeFactor
3037 and RAM buffer size. (Otis Gospodnetic)
3039 9. LUCENE-1290: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator
3040 and remove all references to these classes from the core. Also update demos
3041 and tutorials. (Michael Busch)
3043 10. LUCENE-1288: Add getVersion() and getGeneration() to IndexCommit.
3044 getVersion() returns the same value that IndexReader.getVersion()
3045 returns when the reader is opened on the same commit. (Jason
3046 Rutherglen via Mike McCandless)
3048 11. LUCENE-1311: Added IndexReader.listCommits(Directory) static
3049 method to list all commits in a Directory, plus IndexReader.open
3050 methods that accept an IndexCommit and open the index as of that
3051 commit. These methods are only useful if you implement a custom
3052 DeletionPolicy that keeps more than the last commit around.
3053 (Jason Rutherglen via Mike McCandless)
3055 12. LUCENE-1325: Added IndexCommit.isOptimized(). (Shalin Shekhar
3056 Mangar via Mike McCandless)
3058 13. LUCENE-1324: Added TokenFilter.reset(). (Shai Erera via Mike
3061 14. LUCENE-1340: Added Fieldable.omitTf() method to skip indexing term
3062 frequency, positions and payloads. This saves index space, and
3063 indexing/searching time. (Eks Dev via Mike McCandless)
3065 15. LUCENE-1219: Add basic reuse API to Fieldable for binary fields:
3066 getBinaryValue/Offset/Length(); currently only lazy fields reuse
3067 the provided byte[] result to getBinaryValue. (Eks Dev via Mike
3070 16. LUCENE-1334: Add new constructor for Term: Term(String fieldName)
3071 which defaults term text to "". (DM Smith via Mike McCandless)
3073 17. LUCENE-1333: Added Token.reinit(*) APIs to re-initialize (reuse) a
3074 Token. Also added term() method to return a String, with a
3075 performance penalty clearly documented. Also implemented
3076 hashCode() and equals() in Token, and fixed all core and contrib
3077 analyzers to use the re-use APIs. (DM Smith via Mike McCandless)
3079 18. LUCENE-1329: Add optional readOnly boolean when opening an
3080 IndexReader. A readOnly reader is not allowed to make changes
3081 (deletions, norms) to the index; in exchanged, the isDeleted
3082 method, often a bottleneck when searching with many threads, is
3083 not synchronized. The default for readOnly is still false, but in
3084 3.0 the default will become true. (Jason Rutherglen via Mike
3087 19. LUCENE-1367: Add IndexCommit.isDeleted(). (Shalin Shekhar Mangar
3088 via Mike McCandless)
3090 20. LUCENE-1061: Factored out all "new XXXQuery(...)" in
3091 QueryParser.java into protected methods newXXXQuery(...) so that
3092 subclasses can create their own subclasses of each Query type.
3093 (John Wang via Mike McCandless)
3095 21. LUCENE-753: Added new Directory implementation
3096 org.apache.lucene.store.NIOFSDirectory, which uses java.nio's
3097 FileChannel to do file reads. On most non-Windows platforms, with
3098 many threads sharing a single searcher, this may yield sizable
3099 improvement to query throughput when compared to FSDirectory,
3100 which only allows a single thread to read from an open file at a
3101 time. (Jason Rutherglen via Mike McCandless)
3103 22. LUCENE-1371: Added convenience method TopDocs Searcher.search(Query query, int n).
3106 23. LUCENE-1356: Allow easy extensions of TopDocCollector by turning
3107 constructor and fields from package to protected. (Shai Erera
3110 24. LUCENE-1375: Added convenience method IndexCommit.getTimestamp,
3111 which is equivalent to
3112 getDirectory().fileModified(getSegmentsFileName()). (Mike McCandless)
3114 23. LUCENE-1366: Rename Field.Index options to be more accurate:
3115 TOKENIZED becomes ANALYZED; UN_TOKENIZED becomes NOT_ANALYZED;
3116 NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS
3117 is added. (Mike McCandless)
3119 24. LUCENE-1131: Added numDeletedDocs method to IndexReader (Otis Gospodnetic)
3123 1. LUCENE-1134: Fixed BooleanQuery.rewrite to only optimize a single
3124 clause query if minNumShouldMatch<=0. (Shai Erera via Michael Busch)
3126 2. LUCENE-1169: Fixed bug in IndexSearcher.search(): searching with
3127 a filter might miss some hits because scorer.skipTo() is called
3128 without checking if the scorer is already at the right position.
3129 scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as
3130 scorer.next(). (Eks Dev, Michael Busch)
3132 3. LUCENE-1182: Added scorePayload to SimilarityDelegator (Andi Vajda via Grant Ingersoll)
3134 4. LUCENE-1213: MultiFieldQueryParser was ignoring slop in case
3135 of a single field phrase. (Trejkaz via Doron Cohen)
3137 5. LUCENE-1228: IndexWriter.commit() was not updating the index version and as
3138 result IndexReader.reopen() failed to sense index changes. (Doron Cohen)
3140 6. LUCENE-1267: Added numDocs() and maxDoc() to IndexWriter;
3141 deprecated docCount(). (Mike McCandless)
3143 7. LUCENE-1274: Added new prepareCommit() method to IndexWriter,
3144 which does phase 1 of a 2-phase commit (commit() does phase 2).
3145 This is needed when you want to update an index as part of a
3146 transaction involving external resources (eg a database). Also
3147 deprecated abort(), renaming it to rollback(). (Mike McCandless)
3149 8. LUCENE-1003: Stop RussianAnalyzer from removing numbers.
3150 (TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
3152 9. LUCENE-1152: SpellChecker fix around clearIndex and indexDictionary
3153 methods, plus removal of IndexReader reference.
3154 (Naveen Belkale via Otis Gospodnetic)
3156 10. LUCENE-1046: Removed dead code in SpellChecker
3157 (Daniel Naber via Otis Gospodnetic)
3159 11. LUCENE-1189: Fixed the QueryParser to handle escaped characters within
3160 quoted terms correctly. (Tomer Gabel via Michael Busch)
3162 12. LUCENE-1299: Fixed NPE in SpellChecker when IndexReader is not null and field is (Grant Ingersoll)
3164 13. LUCENE-1303: Fixed BoostingTermQuery's explanation to be marked as a Match
3165 depending only upon the non-payload score part, regardless of the effect of
3166 the payload on the score. Prior to this, score of a query containing a BTQ
3167 differed from its explanation. (Doron Cohen)
3169 14. LUCENE-1310: Fixed SloppyPhraseScorer to work also for terms repeating more
3170 than twice in the query. (Doron Cohen)
3172 15. LUCENE-1351: ISOLatin1AccentFilter now cleans additional ligatures (Cedrik Lime via Grant Ingersoll)
3174 16. LUCENE-1383: Workaround a nasty "leak" in Java's builtin
3175 ThreadLocal, to prevent Lucene from causing unexpected
3176 OutOfMemoryError in certain situations (notably J2EE
3177 applications). (Chris Lu via Mike McCandless)
3181 1. LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis
3182 process. The flag is not indexed/stored and is thus only used by analysis.
3184 2. LUCENE-1147: Add -segment option to CheckIndex tool so you can
3185 check only a specific segment or segments in your index. (Mike
3188 3. LUCENE-1045: Reopened this issue to add support for short and bytes.
3190 4. LUCENE-584: Added new data structures to o.a.l.util, such as
3191 OpenBitSet and SortedVIntList. These extend DocIdSet and can
3192 directly be used for Filters with the new Filter API. Also changed
3193 the core Filters to use OpenBitSet instead of java.util.BitSet.
3194 (Paul Elschot, Michael Busch)
3196 5. LUCENE-494: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms.
3197 This Analyzer is not intended for use during indexing. (Mark Harwood via Grant Ingersoll)
3199 6. LUCENE-1044: Change Lucene to properly "sync" files after
3200 committing, to ensure on a machine or OS crash or power cut, even
3201 with cached writes, the index remains consistent. Also added
3202 explicit commit() method to IndexWriter to force a commit without
3203 having to close. (Mike McCandless)
3205 7. LUCENE-997: Add search timeout (partial) support.
3206 A TimeLimitedCollector was added to allow limiting search time.
3207 It is a partial solution since timeout is checked only when
3208 collecting a hit, and therefore a search for rare words in a
3209 huge index might not stop within the specified time.
3210 (Sean Timm via Doron Cohen)
3212 8. LUCENE-1184: Allow SnapshotDeletionPolicy to be re-used across
3213 close/re-open of IndexWriter while still protecting an open
3214 snapshot (Tim Brennan via Mike McCandless)
3216 9. LUCENE-1194: Added IndexWriter.deleteDocuments(Query) to delete
3217 documents matching the specified query. Also added static unlock
3218 and isLocked methods (deprecating the ones in IndexReader). (Mike
3221 10. LUCENE-1201: Add IndexReader.getIndexCommit() method. (Tim Brennan
3222 via Mike McCandless)
3224 11. LUCENE-550: Added InstantiatedIndex implementation. Experimental
3225 Index store similar to MemoryIndex but allows for multiple documents
3226 in memory. (Karl Wettin via Grant Ingersoll)
3228 12. LUCENE-400: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper
3229 that wraps another Analyzer's token stream with a ShingleFilter (Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
3231 13. LUCENE-1166: Decomposition tokenfilter for languages like German and Swedish (Thomas Peuss via Grant Ingersoll)
3233 14. LUCENE-1187: ChainedFilter and BooleanFilter now work with new Filter API
3234 and DocIdSetIterator-based filters. Backwards-compatibility with old
3235 BitSet-based filters is ensured. (Paul Elschot via Michael Busch)
3237 15. LUCENE-1295: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public. (Grant Ingersoll)
3239 16. LUCENE-1298: MoreLikeThis can now accept a custom Similarity (Grant Ingersoll)
3241 17. LUCENE-1297: Allow other string distance measures for the SpellChecker
3242 (Thomas Morton via Otis Gospodnetic)
3244 18. LUCENE-1001: Provide access to Payloads via Spans. All existing Span Query implementations in Lucene implement. (Mark Miller, Grant Ingersoll)
3246 19. LUCENE-1354: Provide programmatic access to CheckIndex (Grant Ingersoll, Mike McCandless)
3248 20. LUCENE-1279: Add support for Collators to RangeFilter/Query and Query Parser. (Steve Rowe via Grant Ingersoll)
3252 1. LUCENE-705: When building a compound file, use
3253 RandomAccessFile.setLength() to tell the OS/filesystem to
3254 pre-allocate space for the file. This may improve fragmentation
3255 in how the CFS file is stored, and allows us to detect an upcoming
3256 disk full situation before actually filling up the disk. (Mike
3259 2. LUCENE-1120: Speed up merging of term vectors by bulk-copying the
3260 raw bytes for each contiguous range of non-deleted documents.
3263 3. LUCENE-1185: Avoid checking if the TermBuffer 'scratch' in
3264 SegmentTermEnum is null for every call of scanTo().
3265 (Christian Kohlschuetter via Michael Busch)
3267 4. LUCENE-1217: Internal to Field.java, use isBinary instead of
3268 runtime type checking for possible speedup of binaryValue().
3269 (Eks Dev via Mike McCandless)
3271 5. LUCENE-1183: Optimized TRStringDistance class (in contrib/spell) that uses
3272 less memory than the previous version. (Cédrik LIME via Otis Gospodnetic)
3274 6. LUCENE-1195: Improve term lookup performance by adding a LRU cache to the
3275 TermInfosReader. In performance experiments the speedup was about 25% on
3276 average on mid-size indexes with ~500,000 documents for queries with 3
3277 terms and about 7% on larger indexes with ~4.3M documents. (Michael Busch)
3281 1. LUCENE-1236: Added some clarifying remarks to EdgeNGram*.java (Hiroaki Kawai via Grant Ingersoll)
3283 2. LUCENE-1157 and LUCENE-1256: HTML changes log, created automatically
3284 from CHANGES.txt. This HTML file is currently visible only via developers page.
3285 (Steven Rowe via Doron Cohen)
3287 3. LUCENE-1349: Fieldable can now be changed without breaking backward compatibility rules (within reason. See the note at
3288 the top of this file and also on Fieldable.java). (Grant Ingersoll)
3290 4. LUCENE-1873: Update documentation to reflect current Contrib area status.
3291 (Steven Rowe, Mark Miller)
3295 1. LUCENE-1153: Added JUnit JAR to new lib directory. Updated build to rely on local JUnit instead of ANT/lib.
3297 2. LUCENE-1202: Small fixes to the way Clover is used to work better
3298 with contribs. Of particular note: a single clover db is used
3299 regardless of whether tests are run globally or in the specific
3300 contrib directories.
3302 3. LUCENE-1353: Javacc target in contrib/miscellaneous for
3303 generating the precedence query parser.
3307 1. LUCENE-1238: Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded.
3308 Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped
3309 collector to collect also the last doc, after allowed-tTime passed. (Doron Cohen)
3311 2. LUCENE-1348: relax TestTimeLimitedCollector to not fail due to
3312 timeout exceeded (just because test machine is very busy).
3314 ======================= Release 2.3.2 =======================
3318 1. LUCENE-1191: On hitting OutOfMemoryError in any index-modifying
3319 methods in IndexWriter, do not commit any further changes to the
3320 index to prevent risk of possible corruption. (Mike McCandless)
3322 2. LUCENE-1197: Fixed issue whereby IndexWriter would flush by RAM
3323 too early when TermVectors were in use. (Mike McCandless)
3325 3. LUCENE-1198: Don't corrupt index if an exception happens inside
3326 DocumentsWriter.init (Mike McCandless)
3328 4. LUCENE-1199: Added defensive check for null indexReader before
3329 calling close in IndexModifier.close() (Mike McCandless)
3331 5. LUCENE-1200: Fix rare deadlock case in addIndexes* when
3332 ConcurrentMergeScheduler is in use (Mike McCandless)
3334 6. LUCENE-1208: Fix deadlock case on hitting an exception while
3335 processing a document that had triggered a flush (Mike McCandless)
3337 7. LUCENE-1210: Fix deadlock case on hitting an exception while
3338 starting a merge when using ConcurrentMergeScheduler (Mike McCandless)
3340 8. LUCENE-1222: Fix IndexWriter.doAfterFlush to always be called on
3341 flush (Mark Ferguson via Mike McCandless)
3343 9. LUCENE-1226: Fixed IndexWriter.addIndexes(IndexReader[]) to commit
3344 successfully created compound files. (Michael Busch)
3346 10. LUCENE-1150: Re-expose StandardTokenizer's constants publicly;
3347 this was accidentally lost with LUCENE-966. (Nicolas Lalevée via
3350 11. LUCENE-1262: Fixed bug in BufferedIndexReader.refill whereby on
3351 hitting an exception in readInternal, the buffer is incorrectly
3352 filled with stale bytes such that subsequent calls to readByte()
3353 return incorrect results. (Trejkaz via Mike McCandless)
3355 12. LUCENE-1270: Fixed intermittent case where IndexWriter.close()
3356 would hang after IndexWriter.addIndexesNoOptimize had been
3357 called. (Stu Hood via Mike McCandless)
3361 1. LUCENE-1230: Include *pom.xml* in source release files. (Michael Busch)
3364 ======================= Release 2.3.1 =======================
3368 1. LUCENE-1168: Fixed corruption cases when autoCommit=false and
3369 documents have mixed term vectors (Suresh Guvvala via Mike
3372 2. LUCENE-1171: Fixed some cases where OOM errors could cause
3373 deadlock in IndexWriter (Mike McCandless).
3375 3. LUCENE-1173: Fixed corruption case when autoCommit=false and bulk
3376 merging of stored fields is used (Yonik via Mike McCandless).
3378 4. LUCENE-1163: Fixed bug in CharArraySet.contains(char[] buffer, int
3379 offset, int len) that was ignoring offset and thus giving the
3380 wrong answer. (Thomas Peuss via Mike McCandless)
3382 5. LUCENE-1177: Fix rare case where IndexWriter.optimize might do too
3383 many merges at the end. (Mike McCandless)
3385 6. LUCENE-1176: Fix corruption case when documents with no term
3386 vector fields are added before documents with term vector fields.
3389 7. LUCENE-1179: Fixed assert statement that was incorrectly
3390 preventing Fields with empty-string field name from working.
3391 (Sergey Kabashnyuk via Mike McCandless)
3393 ======================= Release 2.3.0 =======================
3395 Changes in runtime behavior
3397 1. LUCENE-994: Defaults for IndexWriter have been changed to maximize
3398 out-of-the-box indexing speed. First, IndexWriter now flushes by
3399 RAM usage (16 MB by default) instead of a fixed doc count (call
3400 IndexWriter.setMaxBufferedDocs to get backwards compatible
3401 behavior). Second, ConcurrentMergeScheduler is used to run merges
3402 using background threads (call IndexWriter.setMergeScheduler(new
3403 SerialMergeScheduler()) to get backwards compatible behavior).
3404 Third, merges are chosen based on size in bytes of each segment
3405 rather than document count of each segment (call
3406 IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get
3407 backwards compatible behavior).
3409 NOTE: users of ParallelReader must change back all of these
3410 defaults in order to ensure the docIDs "align" across all parallel
3415 2. LUCENE-1045: SortField.AUTO didn't work with long. When detecting
3416 the field type for sorting automatically, numbers used to be
3417 interpreted as int, then as float, if parsing the number as an int
3418 failed. Now the detection checks for int, then for long,
3419 then for float. (Daniel Naber)
3423 1. LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have
3424 IndexWriter flush whenever the buffered documents are using more
3425 than the specified amount of RAM. Also added new APIs to Token
3426 that allow one to set a char[] plus offset and length to specify a
3427 token (to avoid creating a new String() for each Token). (Mike
3430 2. LUCENE-963: Add setters to Field to allow for re-using a single
3431 Field instance during indexing. This is a sizable performance
3432 gain, especially for small documents. (Mike McCandless)
3434 3. LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
3435 permit re-using of Token and TokenStream instances during
3436 indexing. Changed Token to use a char[] as the store for the
3437 termText instead of String. This gives faster tokenization
3438 performance (~10-15%). (Mike McCandless)
3440 4. LUCENE-847: Factored MergePolicy, which determines which merges
3441 should take place and when, as well as MergeScheduler, which
3442 determines when the selected merges should actually run, out of
3443 IndexWriter. The default merge policy is now
3444 LogByteSizeMergePolicy (see LUCENE-845) and the default merge
3445 scheduler is now ConcurrentMergeScheduler (see
3446 LUCENE-870). (Steven Parkes via Mike McCandless)
3448 5. LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method
3449 that allows you to reduce memory usage of the termInfos by further
3450 sub-sampling (over the termIndexInterval that was used during
3451 indexing) which terms are loaded into memory. (Chuck Williams,
3452 Doug Cutting via Mike McCandless)
3454 6. LUCENE-743: Add IndexReader.reopen() method that re-opens an
3455 existing IndexReader (see New features -> 8.) (Michael Busch)
3457 7. LUCENE-1062: Add setData(byte[] data),
3458 setData(byte[] data, int offset, int length), getData(), getOffset()
3459 and clone() methods to o.a.l.index.Payload. Also add the field name
3460 as arg to Similarity.scorePayload(). (Michael Busch)
3462 8. LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to
3463 "partially optimize" an index down to maxNumSegments segments.
3466 9. LUCENE-1080: Changed Token.DEFAULT_TYPE to be public.
3468 10. LUCENE-1064: Changed TopDocs constructor to be public.
3469 (Shai Erera via Michael Busch)
3471 11. LUCENE-1079: DocValues cleanup: constructor now has no params,
3472 and getInnerArray() now throws UnsupportedOperationException (Doron Cohen)
3474 12. LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns
3475 the Object (if any) that was bumped from the queue to allow
3476 re-use. (Shai Erera via Mike McCandless)
3478 13. LUCENE-1101: Token reuse 'contract' (defined LUCENE-969)
3479 modified so it is token producer's responsibility
3480 to call Token.clear(). (Doron Cohen)
3482 14. LUCENE-1118: Changed StandardAnalyzer to skip too-long (default >
3483 255 characters) tokens. You can increase this limit by calling
3484 StandardAnalyzer.setMaxTokenLength(...). (Michael McCandless)
3489 1. LUCENE-933: QueryParser fixed to not produce empty sub
3490 BooleanQueries "()" even if the Analyzer produced no
3491 tokens for input. (Doron Cohen)
3493 2. LUCENE-955: Fixed SegmentTermPositions to work correctly with the
3494 first term in the dictionary. (Michael Busch)
3496 3. LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader
3497 that was thrown after a call of TermPositions.seek().
3498 (Rich Johnson via Michael Busch)
3500 4. LUCENE-938: Fixed cases where an unhandled exception in
3501 IndexWriter's methods could cause deletes to be lost.
3502 (Steven Parkes via Mike McCandless)
3504 5. LUCENE-962: Fixed case where an unhandled exception in
3505 IndexWriter.addDocument or IndexWriter.updateDocument could cause
3506 unreferenced files in the index to not be deleted
3507 (Steven Parkes via Mike McCandless)
3509 6. LUCENE-957: RAMDirectory fixed to properly handle directories
3510 larger than Integer.MAX_VALUE. (Doron Cohen)
3512 7. LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(),
3513 isOptimized() or getVersion() is called. Separated MultiReader
3514 into two classes: MultiSegmentReader extends IndexReader, is
3515 package-protected and is created automatically by IndexReader.open()
3516 in case the index has multiple segments. The public MultiReader
3517 now extends MultiSegmentReader and is intended to be used by users
3518 who want to add their own subreaders. (Daniel Naber, Michael Busch)
3520 8. LUCENE-970: FilterIndexReader now implements isOptimized(). Before
3521 a call of isOptimized() would throw a NPE. (Michael Busch)
3523 9. LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(),
3524 isOptimized() or getVersion() is called. (Michael Busch)
3526 10. LUCENE-948: Fix FNFE exception caused by stale NFS client
3527 directory listing caches when writers on different machines are
3528 sharing an index over NFS and using a custom deletion policy (Mike
3531 11. LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader
3532 close any streams they had opened if an exception is hit in the
3533 constructor. (Ning Li via Mike McCandless)
3535 12. LUCENE-985: If an extremely long term is in a doc (> 16383 chars),
3536 we now throw an IllegalArgumentException saying the term is too
3537 long, instead of cryptic ArrayIndexOutOfBoundsException. (Karl
3538 Wettin via Mike McCandless)
3540 13. LUCENE-991: The explain() method of BoostingTermQuery had errors
3541 when no payloads were present on a document. (Peter Keegan via
3544 14. LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again
3545 (this was broken by LUCENE-843). (Ning Li via Mike McCandless)
3547 15. LUCENE-1008: Fixed corruption case when document with no term
3548 vector fields is added after documents with term vector fields.
3549 This bug was introduced with LUCENE-843. (Grant Ingersoll via
3552 16. LUCENE-1006: Fixed QueryParser to accept a "" field value (zero
3553 length quoted string.) (yonik)
3555 17. LUCENE-1010: Fixed corruption case when document with no term
3556 vector fields is added after documents with term vector fields.
3557 This case is hit during merge and would cause an EOFException.
3558 This bug was introduced with LUCENE-984. (Andi Vajda via Mike
3561 19. LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when
3562 autoCommit=false and documents are using stored fields and/or term
3563 vectors. (Mark Miller via Mike McCandless)
3565 20. LUCENE-1011: Fixed corruption case when two or more machines,
3566 sharing an index over NFS, can be writers in quick succession.
3567 (Patrick Kimber via Mike McCandless)
3569 21. LUCENE-1028: Fixed Weight serialization for few queries:
3570 DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery.
3571 Serialization check added for all queries.
3572 (Kyle Maxwell via Doron Cohen)
3574 22. LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the
3575 timeout argument is very large (eg Long.MAX_VALUE). Also added
3576 Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout. (Nikolay
3577 Diakov via Mike McCandless)
3579 23. LUCENE-1050: Throw LockReleaseFailedException in
3580 Simple/NativeFSLockFactory if we fail to delete the lock file when
3581 releasing the lock. (Nikolay Diakov via Mike McCandless)
3583 24. LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in
3584 the merged segment. (Michael Busch)
3586 25. LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent
3587 with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent. (Karl Wettin via Grant Ingersoll)
3589 26. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted
3590 along with iterating the hits. Deleting docs already retrieved
3591 now works seamlessly. If docs not yet retrieved are deleted
3592 (e.g. from another thread), and then, relying on the initial
3593 Hits.length(), an application attempts to retrieve more hits
3594 than actually exist , a ConcurrentMidificationException
3595 is thrown. (Doron Cohen)
3597 27. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking
3598 the type of some tokens incorrectly. This is done by adding a new flag named
3599 replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting
3600 this flag to true fixes the problem. This flag is a temporary fix and is already
3601 marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll)
3602 LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)
3604 28. LUCENE-749: ChainedFilter behavior fixed when logic of
3605 first filter is ANDNOT. (Antonio Bruno via Doron Cohen)
3607 29. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last
3608 term) after next() returns false. (Steven Tamm via Mike
3614 1. LUCENE-906: Elision filter for French.
3615 (Mathieu Lecarme via Otis Gospodnetic)
3617 2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for
3618 not only filtering, but knowing where in a Document a Filter matches
3621 3. LUCENE-868: Added new Term Vector access features. New callback
3622 mechanism allows application to define how and where to read Term
3623 Vectors from disk. This implementation contains several extensions
3624 of the new abstract TermVectorMapper class. The new API should be
3625 back-compatible. No changes in the actual storage of Term Vectors
3627 3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper
3628 to provide information about what document is being accessed.
3629 (Karl Wettin via Grant Ingersoll)
3631 4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for
3632 position based lookup of term vector information.
3633 See item #3 above (LUCENE-868).
3635 5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store)
3636 to verify that locking is working properly. LockVerifyServer runs
3637 a separate server to verify locks. LockStressTest runs a simple
3638 tool that rapidly obtains and releases locks.
3639 VerifyingLockFactory is a LockFactory that wraps any other
3640 LockFactory and consults the LockVerifyServer whenever a lock is
3641 obtained or released, throwing an exception if an illegal lock
3642 obtain occurred. (Patrick Kimber via Mike McCandless)
3644 6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to
3645 support doubles and longs. Added support into SortField for sorting
3646 on doubles and longs as well. (Grant Ingersoll)
3648 7. LUCENE-1020: Created basic index checking & repair tool
3649 (o.a.l.index.CheckIndex). When run without -fix it does a
3650 detailed test of all segments in the index and reports summary
3651 information and any errors it hit. With -fix it will remove
3652 segments that had errors. (Mike McCandless)
3654 8. LUCENE-743: Add IndexReader.reopen() method that re-opens an
3655 existing IndexReader by only loading those portions of an index
3656 that have changed since the reader was (re)opened. reopen() can
3657 be significantly faster than open(), depending on the amount of
3658 index changes. SegmentReader, MultiSegmentReader, MultiReader,
3659 and ParallelReader implement reopen(). (Michael Busch)
3661 9. LUCENE-1040: CharArraySet useful for efficiently checking
3662 set membership of text specified by char[]. (yonik)
3664 10. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a
3665 live backup of an index without pausing indexing. (Mike
3668 11. LUCENE-1019: CustomScoreQuery enhanced to support multiple
3669 ValueSource queries. (Kyle Maxwell via Doron Cohen)
3671 12. LUCENE-1095: Added an option to StopFilter to increase
3672 positionIncrement of the token succeeding a stopped token.
3673 Disabled by default. Similar option added to QueryParser
3674 to consider token positions when creating PhraseQuery
3675 and MultiPhraseQuery. Disabled by default (so by default
3676 the query parser ignores position increments).
3679 13. LUCENE-1380: Added TokenFilter for setting position increment in special cases related to the ShingleFilter (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll)
3685 1. LUCENE-937: CachingTokenFilter now uses an iterator to access the
3686 Tokens that are cached in the LinkedList. This increases performance
3687 significantly, especially when the number of Tokens is large.
3688 (Mark Miller via Michael Busch)
3690 2. LUCENE-843: Substantial optimizations to improve how IndexWriter
3691 uses RAM for buffering documents and to speed up indexing (2X-8X
3692 faster). A single shared hash table now records the in-memory
3693 postings per unique term and is directly flushed into a single
3694 segment. (Mike McCandless)
3696 3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes
3697 takes place when using compound files. (Mike McCandless)
3699 4. LUCENE-959: Remove synchronization in Document (yonik)
3701 5. LUCENE-963: Add setters to Field to allow for re-using a single
3702 Field instance during indexing. This is a sizable performance
3703 gain, especially for small documents. (Mike McCandless)
3705 6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos
3706 and don't rely on exceptions. (Michael Busch)
3708 7. LUCENE-966: Very substantial speedups (~6X faster) for
3709 StandardTokenizer (StandardAnalyzer) by using JFlex instead of
3710 JavaCC to generate the tokenizer.
3711 (Stanislaw Osinski via Mike McCandless)
3713 8. LUCENE-969: Changed core tokenizers & filters to re-use Token and
3714 TokenStream instances when possible to improve tokenization
3715 performance (~10-15%). (Mike McCandless)
3717 9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike
3720 10. LUCENE-986: Refactored SegmentInfos from IndexReader into the new
3721 subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader
3722 now extend DirectoryIndexReader and are the only IndexReader
3723 implementations that use SegmentInfos to access an index and
3724 acquire a write lock for index modifications. (Michael Busch)
3726 11. LUCENE-1007: Allow flushing in IndexWriter to be triggered by
3727 either RAM usage or document count or both (whichever comes
3728 first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable
3729 one of the flush triggers. (Ning Li via Mike McCandless)
3731 12. LUCENE-1043: Speed up merging of stored fields by bulk-copying the
3732 raw bytes for each contiguous range of non-deleted documents.
3733 (Robert Engels via Mike McCandless)
3735 13. LUCENE-693: Speed up nested conjunctions (~2x) that match many
3736 documents, and a slight performance increase for top level
3737 conjunctions. (yonik)
3739 14. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static
3740 and final. (Nathan Beyer via Michael Busch)
3744 1. LUCENE-1051: Generate separate javadocs for core, demo and contrib
3745 classes, as well as an unified view. Also add an appropriate menu
3746 structure to the website. (Michael Busch)
3748 2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery.
3749 (Ronnie Kolehmainen via Michael Busch)
3753 1. LUCENE-908: Improvements and simplifications for how the MANIFEST
3754 file and the META-INF dir are created. (Michael Busch)
3756 2. LUCENE-935: Various improvements for the maven artifacts. Now the
3757 artifacts also include the sources as .jar files. (Michael Busch)
3759 3. Added apply-patch target to top-level build. Defaults to looking for
3760 a patch in ${basedir}/../patches with name specified by -Dpatch.name.
3761 Can also specify any location by -Dpatch.file property on the command
3762 line. This should be helpful for easy application of patches, but it
3763 is also a step towards integrating automatic patch application with
3764 JIRA and Hudson, and is thus subject to change. (Grant Ingersoll)
3766 4. LUCENE-935: Defined property "m2.repository.url" to allow setting
3767 the url to a maven remote repository to deploy to. (Michael Busch)
3769 5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)
3771 6. LUCENE-1055: Remove gdata-server from build files and its sources
3772 from trunk. (Michael Busch)
3774 7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository
3775 via scp and ssh authentication. (Michael Busch)
3777 8. LUCENE-1123: Allow overriding the specification version for
3778 MANIFEST.MF (Michael Busch)
3782 1. LUCENE-766: Test adding two fields with the same name but different
3783 term vector setting. (Nicolas Lalevée via Doron Cohen)
3785 ======================= Release 2.2.0 =======================
3787 Changes in runtime behavior
3791 1. LUCENE-793: created new exceptions and added them to throws clause
3792 for many methods (all subclasses of IOException for backwards
3793 compatibility): index.StaleReaderException,
3794 index.CorruptIndexException, store.LockObtainFailedException.
3795 This was done to better call out the possible root causes of an
3796 IOException from these methods. (Mike McCandless)
3798 2. LUCENE-811: make SegmentInfos class, plus a few methods from related
3799 classes, package-private again (they were unnecessarily made public
3800 as part of LUCENE-701). (Mike McCandless)
3802 3. LUCENE-710: added optional autoCommit boolean to IndexWriter
3803 constructors. When this is false, index changes are not committed
3804 until the writer is closed. This gives explicit control over when
3805 a reader will see the changes. Also added optional custom
3806 deletion policy to explicitly control when prior commits are
3807 removed from the index. This is intended to allow applications to
3808 share an index over NFS by customizing when prior commits are
3809 deleted. (Mike McCandless)
3811 4. LUCENE-818: changed most public methods of IndexWriter,
3812 IndexReader (and its subclasses), FieldsReader and RAMDirectory to
3813 throw AlreadyClosedException if they are accessed after being
3814 closed. (Mike McCandless)
3816 5. LUCENE-834: Changed some access levels for certain Span classes to allow them
3817 to be overridden. They have been marked expert only and not for public
3818 consumption. (Grant Ingersoll)
3820 6. LUCENE-796: Removed calls to super.* from various get*Query methods in
3821 MultiFieldQueryParser, in order to allow sub-classes to override them.
3822 (Steven Parkes via Otis Gospodnetic)
3824 7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
3825 in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
3826 combination when caching is desired.
3827 (Chris Hostetter, Otis Gospodnetic)
3829 8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
3830 to enable extensibility of these classes. (Michael Busch)
3832 9. LUCENE-580: Added the public method reset() to TokenStream. This method does
3833 nothing by default, but may be overwritten by subclasses to support consuming
3834 the TokenStream more than once. (Michael Busch)
3836 10. LUCENE-580: Added a new constructor to Field that takes a TokenStream as
3837 argument, available as tokenStreamValue(). This is useful to avoid the need of
3838 "dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
3840 11. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
3841 getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
3842 getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
3843 improves performance for certain queries but results in scoring out of docid
3844 order. This patch reverse this change, so now by default hit docs are scored
3845 in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
3846 This patch also enables the tests in QueryUtils again that check for docid
3847 order. (Paul Elschot, Doron Cohen, Michael Busch)
3849 12. LUCENE-888: Added Directory.openInput(File path, int bufferSize)
3850 to optionally specify the size of the read buffer. Also added
3851 BufferedIndexInput.setBufferSize(int) to change the buffer size.
3854 13. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need
3855 to be public because it implements the public interface TermPositionVector.
3860 1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen)
3862 2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
3863 Query parser modified to create a prefix query only for the case
3864 that there is a single trailing wildcard (and no additional wildcard
3865 or '?' in the query text). (Doron Cohen)
3867 3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory
3868 and SimpleFSLockFactory. This enables all 4 builtin LockFactory
3869 implementations to be specified via the System property
3870 org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless)
3872 4. LUCENE-821: The new single-norm-file introduced by LUCENE-756
3873 failed to reduce the number of open descriptors since it was still
3874 opened once per field with norms. (yonik)
3876 5. LUCENE-823: Make sure internal file handles are closed when
3877 hitting an exception (eg disk full) while flushing deletes in
3878 IndexWriter's mergeSegments, and also during
3879 IndexWriter.addIndexes. (Mike McCandless)
3881 6. LUCENE-825: If directory is removed after
3882 FSDirectory.getDirectory() but before IndexReader.open you now get
3883 a FileNotFoundException like Lucene pre-2.1 (before this fix you
3884 got an NPE). (Mike McCandless)
3886 7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
3887 because the backslash is the escape character. Also changed the ESCAPED_CHAR
3888 list to contain all possible characters, because every character that
3889 follows a backslash should be considered as escaped. (Michael Busch)
3891 8. LUCENE-372: QueryParser.parse() now ensures that the entire input string
3892 is consumed. Now a ParseException is thrown if a query contains too many
3893 closing parentheses. (Andreas Neumann via Michael Busch)
3895 9. LUCENE-814: javacc build targets now fix line-end-style of generated files.
3896 Now also deleting all javacc generated files before calling javacc.
3897 (Steven Parkes, Doron Cohen)
3899 10. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
3901 11. LUCENE-828: Minor fix for Term's equal().
3902 (Paul Cowan via Otis Gospodnetic)
3904 12. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
3905 and you call addIndexes, and hit an exception (eg disk full) then
3906 when IndexWriter rolls back its internal state this could corrupt
3907 the instance of IndexWriter (but, not the index itself) by
3908 referencing already deleted segments. This bug was only present
3909 in 2.2 (trunk), ie was never released. (Mike McCandless)
3911 13. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
3912 For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
3914 14. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
3915 by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
3916 Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
3917 was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
3918 designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
3920 15. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
3921 has written the postings. Then the resources associated with the
3922 TokenStreams can safely be released. (Michael Busch)
3924 16. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()
3925 won't insert terms twice anymore. (Daniel Naber)
3927 17. LUCENE-881: QueryParser.escape() now also escapes the characters
3928 '|' and '&' which are part of the queryparser syntax. (Michael Busch)
3930 18. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
3931 anymore and ignored, but re-thrown. Some javadoc improvements.
3934 19. LUCENE-698: FilteredQuery now takes the query boost into account for
3935 scoring. (Michael Busch)
3937 20. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
3938 enumeration. (Christian Mallwitz via Daniel Naber)
3940 21. LUCENE-903: FilteredQuery explanation inaccuracy with boost.
3941 Explanation tests now "deep" check the explanation details.
3942 (Chris Hostetter, Doron Cohen)
3944 22. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the
3945 skip target param and ends up at the first match.
3946 (Sudaakeran B. via Chris Hostetter & Doron Cohen)
3948 23. LUCENE-913: Two consecutive score() calls return different
3949 scores for Boolean Queries. (Michael Busch, Doron Cohen)
3951 24. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the
3952 box", again, by moving set/getMaxMergeDocs up from
3953 LogDocMergePolicy into LogMergePolicy. This fixes the API
3954 breakage (non backwards compatible change) caused by LUCENE-994.
3955 (Yonik Seeley via Mike McCandless)
3959 1. LUCENE-759: Added two n-gram-producing TokenFilters.
3962 2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with
3963 RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
3965 3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
3966 These metadata are called Payloads. For every position of a Token one Payload in the form
3967 of a variable length byte array can be stored in the prox file.
3968 Remark: The APIs introduced with this feature are in experimental state and thus
3969 contain appropriate warnings in the javadocs.
3972 4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the
3973 values of a payload (see #3 above.) (Grant Ingersoll)
3975 5. LUCENE-834: Similarity has a new method for scoring payloads called
3976 scorePayloads that can be overridden to take advantage of payload
3977 storage (see #3 above)
3979 6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
3980 implemented it in the appropriate places (Grant Ingersoll)
3982 7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
3983 on the remote side of the RMI connection.
3984 (Matt Ericson via Otis Gospodnetic)
3986 8. LUCENE-446: Added Solr's search.function for scores based on field
3987 values, plus CustomScoreQuery for simple score (post) customization.
3988 (Yonik Seeley, Doron Cohen)
3990 9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more
3991 Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two
3992 Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations
3993 between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples.
3994 (Grant Ingersoll, Michael Busch, Yonik Seeley)
3998 1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
3999 when nextPosition() is called for the first time. This allows using instances
4000 of SegmentTermPositions instead of SegmentTermDocs without additional costs.
4003 2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
4004 IndexOutput directly now. This avoids further buffering and thus avoids
4005 unnecessary array copies. (Michael Busch)
4007 3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
4008 cases and possibly improve scoring performance. Documents can now be
4009 delivered out-of-order as they are scored (e.g. to HitCollector).
4010 N.B. A bit of code had to be disabled in QueryUtils in order for
4011 TestBoolean2 test to keep passing.
4012 (Paul Elschot via Otis Gospodnetic)
4014 4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
4015 them to keep the spell index small. (Daniel Naber)
4017 5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
4018 Together with LUCENE-888 this will allow to adjust the buffer size
4019 dynamically. (Paul Elschot, Michael Busch)
4021 6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
4022 BufferedIndexOutput. Also increase buffer size in
4023 BufferedIndexInput, but only when used during merging. Together,
4024 these increases yield 10-18% overall performance gain vs the
4025 previous 1K defaults. (Mike McCandless)
4027 7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
4028 up most queries that use skipTo(), especially on big indexes with large posting
4029 lists. For average AND queries the speedup is about 20%, for queries that
4030 contain very frequent and very unique terms the speedup can be over 80%.
4035 1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
4036 http://wiki.apache.org/lucene-java/ Updated the links in the docs and
4037 wherever else I found references. (Grant Ingersoll, Joe Schaefer)
4039 2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
4040 consistent with java.util.Comparator.compare(): Any integer is allowed to
4041 be returned instead of only -1/0/1.
4042 (Paul Cowan via Michael Busch)
4044 3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
4045 Solved javadoc errors under jdk5 (jars in path for gdata).
4046 Made "javadocs" target depend on "build-contrib" for first downloading
4047 contrib jars configured for dynamic downloaded. (Note: when running
4048 behind firewall, a firewall prompt might pop up) (Doron Cohen)
4050 4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
4051 remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
4053 5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)
4055 6. LUCENE-926: Added document package javadocs. (Grant Ingersoll)
4059 1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
4060 (Steven Parkes via Michael Busch)
4062 2. LUCENE-885: "ant test" now includes all contrib tests. The new
4063 "ant test-core" target can be used to run only the Core (non
4067 3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages).
4070 4. LUCENE-894: Add custom build file for binary distributions that includes
4071 targets to build the demos. (Chris Hostetter, Michael Busch)
4073 5. LUCENE-904: The "package" targets in build.xml now also generate .md5
4074 checksum files. (Chris Hostetter, Michael Busch)
4076 6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
4077 demo war, demo jar, and the contrib jars. (Michael Busch)
4079 7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)
4081 8. LUCENE-908: Improves content of MANIFEST file and makes it customizable
4082 for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
4083 jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
4084 (Chris Hostetter, Michael Busch)
4086 9. LUCENE-930: Various contrib building improvements to ensure contrib
4087 dependencies are met, and test compilation errors fail the build.
4088 (Steven Parkes, Chris Hostetter)
4090 10. LUCENE-622: Add ant target and pom.xml files for building maven artifacts
4091 of the Lucene core and the contrib modules.
4092 (Sami Siren, Karl Wettin, Michael Busch)
4094 ======================= Release 2.1.0 =======================
4096 Changes in runtime behavior
4098 1. 's' and 't' have been removed from the list of default stopwords
4099 in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
4100 as a stopword meant that 's-class' led to the same results as 'class'.
4101 Note that this problem still exists for 'a', e.g. in 'a-class' as
4102 'a' continues to be a stopword.
4105 2. LUCENE-478: Updated the list of Unicode code point ranges for CJK
4106 (now split into CJ and K) in StandardAnalyzer. (John Wang and
4107 Steven Rowe via Otis Gospodnetic)
4109 3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
4110 and added a few more of them to increase CJK character coverage.
4111 Also documented some of the ranges.
4114 4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
4115 QueryParser. Default is to disallow them, as before.
4116 (Steven Parkes via Otis Gospodnetic)
4118 5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
4119 for range queries. Added useOldRangeQuery property to QueryParser to allow
4120 selection of old RangeQuery class if required.
4123 6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
4124 does not contain a wildcard character (? or *), when previously a
4125 StringIndexOutOfBoundsException was thrown.
4126 (Michael Busch via Erik Hatcher)
4128 7. LUCENE-726: Removed the use of deprecated doc.fields() method and
4130 (Michael Busch via Otis Gospodnetic)
4132 8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
4133 and added a call to enumerators.remove() in TermInfosReader.close().
4134 The finalize() overrides were added to help with a pre-1.4.2 JVM bug
4135 that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
4138 9. LUCENE-771: The default location of the write lock is now the
4139 index directory, and is named simply "write.lock" (without a big
4140 digest prefix). The system properties "org.apache.lucene.lockDir"
4141 nor "java.io.tmpdir" are no longer used as the global directory
4142 for storing lock files, and the LOCK_DIR field of FSDirectory is
4143 now deprecated. (Mike McCandless)
4147 1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
4148 (Samphan Raruenrom via Chris Hostetter)
4150 2. LUCENE-545: New FieldSelector API and associated changes to
4151 IndexReader and implementations. New Fieldable interface for use
4152 with the lazy field loading mechanism. (Grant Ingersoll and Chuck
4153 Williams via Grant Ingersoll)
4155 3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura
4156 Smolsky, Yonik Seeley)
4158 4. LUCENE-678: Added NativeFSLockFactory, which implements locking
4159 using OS native locking (via java.nio.*). (Michael McCandless via
4162 5. LUCENE-544: Added the ability to specify different boosts for
4163 different fields when using MultiFieldQueryParser (Matt Ericson
4164 via Otis Gospodnetic)
4166 6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
4167 optimize the index when adding new segments, only performing
4168 merges as needed. (Ning Li via Yonik Seeley)
4170 7. LUCENE-573: QueryParser now allows backslash escaping in
4171 quoted terms and phrases. (Michael Busch via Yonik Seeley)
4173 8. LUCENE-716: QueryParser now allows specification of Unicode
4174 characters in terms via a unicode escape of the form \uXXXX
4175 (Michael Busch via Yonik Seeley)
4177 9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
4178 and IndexWriter.flushRamSegments(), allowing applications to
4179 control the amount of memory used to buffer documents.
4180 (Chuck Williams via Yonik Seeley)
4182 10. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery
4185 11. LUCENE-741: Command-line utility for modifying or removing norms
4186 on fields in an existing index. This is mostly based on LUCENE-496
4187 and lives in contrib/miscellaneous.
4188 (Chris Hostetter, Otis Gospodnetic)
4190 12. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
4191 their passing unit tests.
4194 13. LUCENE-565: Added methods to IndexWriter to more efficiently
4195 handle updating documents (the "delete then add" use case). This
4196 is intended to be an eventual replacement for the existing
4197 IndexModifier. Added IndexWriter.flush() (renamed from
4198 flushRamSegments()) to flush all pending updates (held in RAM), to
4199 the Directory. (Ning Li via Mike McCandless)
4201 14. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
4202 which allow one to retrieve the size of a field without retrieving the
4203 actual field. (Chuck Williams via Grant Ingersoll)
4205 15. LUCENE-799: Properly handle lazy, compressed fields.
4206 (Mike Klaas via Grant Ingersoll)
4210 1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow
4211 changing of termText via setTermText(). (Yonik Seeley)
4213 2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
4214 and is supposed to be replaced with the WordlistLoader class in
4215 package org.apache.lucene.analysis (Daniel Naber)
4217 3. LUCENE-609: Revert return type of Document.getField(s) to Field
4218 for backward compatibility, added new Document.getFieldable(s)
4219 for access to new lazy loaded fields. (Yonik Seeley)
4221 4. LUCENE-608: Document.fields() has been deprecated and a new method
4222 Document.getFields() has been added that returns a List instead of
4223 an Enumeration (Daniel Naber)
4225 5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
4226 subclass allows explain methods to produce Explanations which model
4227 "matching" independent of having a positive value.
4230 6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
4231 and IndexWriter.setDefaultCommitLockTimeout for overriding default
4232 timeout values for all future instances of IndexWriter (as well
4233 as for any other classes that may reference the static values,
4235 (Michael McCandless via Chris Hostetter)
4237 7. LUCENE-638: FSDirectory.list() now only returns the directory's
4238 Lucene-related files. Thanks to this change one can now construct
4239 a RAMDirectory from a file system directory that contains files
4240 not related to Lucene.
4241 (Simon Willnauer via Daniel Naber)
4243 8. LUCENE-635: Decoupling locking implementation from Directory
4244 implementation. Added set/getLockFactory to Directory and moved
4245 all locking code into subclasses of abstract class LockFactory.
4246 FSDirectory and RAMDirectory still default to their prior locking
4247 implementations, but now you can mix & match, for example using
4248 SingleInstanceLockFactory (ie, in memory locking) locking with an
4249 FSDirectory. Note that now you must call setDisableLocks before
4250 the instantiation a FSDirectory if you wish to disable locking
4252 (Michael McCandless, Jeff Patterson via Yonik Seeley)
4254 9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
4255 (Steven Parkes via Otis Gospodnetic)
4257 10. LUCENE-701: Lockless commits: a commit lock is no longer required
4258 when a writer commits and a reader opens the index. This includes
4259 a change to the index file format (see docs/fileformats.html for
4260 details). It also removes all APIs associated with the commit
4261 lock & its timeout. Readers are now truly read-only and do not
4262 block one another on startup. This is the first step to getting
4263 Lucene to work correctly over NFS (second step is
4264 LUCENE-710). (Mike McCandless)
4266 11. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
4267 in Similarity's MoreLikeThis class. The misspelling has been
4268 replaced by the correct spelling.
4269 (Andi Vajda via Daniel Naber)
4271 12. LUCENE-738: Reduce the size of the file that keeps track of which
4272 documents are deleted when the number of deleted documents is
4273 small. This changes the index file format and cannot be
4274 read by previous versions of Lucene. (Doron Cohen via Yonik Seeley)
4276 13. LUCENE-756: Maintain all norms in a single .nrm file to reduce the
4277 number of open files and file descriptors for the non-compound index
4278 format. This changes the index file format, but maintains the
4279 ability to read and update older indices. The first segment merge
4280 on an older format index will create a single .nrm file for the new
4281 segment. (Doron Cohen via Yonik Seeley)
4283 14. LUCENE-732: DateTools support has been added to QueryParser, with
4284 setters for both the default Resolution, and per-field Resolution.
4285 For backwards compatibility, DateField is still used if no Resolutions
4286 are specified. (Michael Busch via Chris Hostetter)
4288 15. Added isOptimized() method to IndexReader.
4291 16. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
4292 take a boolean "create" argument. Instead you should use
4293 IndexWriter's "create" argument to create a new index.
4296 17. LUCENE-780: Add a static Directory.copy() method to copy files
4297 from one Directory to another. (Jiri Kuhn via Mike McCandless)
4299 18. LUCENE-773: Added Directory.clearLock(String name) to forcefully
4300 remove an old lock. The default implementation is to ask the
4301 lockFactory (if non null) to clear the lock. (Mike McCandless)
4303 19. LUCENE-795: Directory.renameFile() has been deprecated as it is
4304 not used anymore inside Lucene. (Daniel Naber)
4308 1. Fixed the web application demo (built with "ant war-demo") which
4309 didn't work because it used a QueryParser method that had
4310 been removed (Daniel Naber)
4312 2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement
4315 3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar
4316 (Karl Wettin via Yonik Seeley)
4318 4. LUCENE-587: Explanation.toHtml was producing malformed HTML
4321 5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
4323 6. LUCENE-601: RAMDirectory and RAMFile made Serializable
4324 (Karl Wettin via Otis Gospodnetic)
4326 7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
4327 Explanations match up with the real scores.
4330 8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to
4331 new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
4333 9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
4334 disambiguate inner class scorer's use of doc() in BooleanScorer2,
4335 other test code changes. (DM Smith via Yonik Seeley)
4337 10. LUCENE-451: All core query types now use ComplexExplanations so that
4338 boosts of zero don't confuse the BooleanWeight explain method.
4341 11. LUCENE-593: Fixed LuceneDictionary's inner Iterator
4342 (KÃ¥re Fiedler Christiansen via Otis Gospodnetic)
4344 12. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
4347 13. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
4348 to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
4350 14. LUCENE-650: Fixed NPE in Locale specific String Sort when Document
4352 (Oliver Hutchison via Chris Hostetter)
4354 15. LUCENE-683: Fixed data corruption when reading lazy loaded fields.
4357 16. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
4358 lock to be shared between different directories.
4359 (Michael McCandless via Yonik Seeley)
4361 17. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.
4364 18. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
4365 called on it before next(). (Yonik Seeley)
4367 19. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
4368 to recognize ordered spans if they overlapped with unordered spans.
4369 (Paul Elschot via Chris Hostetter)
4371 20. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
4372 in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
4374 21. LUCENE-715: Fixed private constructor in IndexWriter.java to
4375 properly release the acquired write lock if there is an
4376 IOException after acquiring the write lock but before finishing
4377 instantiation. (Matthew Bogosian via Mike McCandless)
4379 22. LUCENE-651: Multiple different threads requesting the same
4380 FieldCache entry (often for Sorting by a field) at the same
4381 time caused multiple generations of that entry, which was
4382 detrimental to performance and memory use.
4383 (Oliver Hutchison via Otis Gospodnetic)
4385 23. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.
4386 (Doron Cohen via Otis Gospodnetic)
4388 24. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
4389 classes from contrib/similarity, as their new home is under
4393 25. LUCENE-669: Do not double-close the RandomAccessFile in
4394 FSIndexInput/Output during finalize(). Besides sending an
4395 IOException up to the GC, this may also be the cause intermittent
4396 "The handle is invalid" IOExceptions on Windows when trying to
4397 close readers or writers. (Michael Busch via Mike McCandless)
4399 26. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
4400 on any exceptions (eg disk full). The semantics of these methods
4401 is now transactional: either all indices are merged or none are.
4402 Also fixed IndexWriter.mergeSegments (called outside of
4403 addIndexes(*) by addDocument, optimize, flushRamSegments) and
4404 IndexReader.commit() (called by close) to clean up and keep the
4405 instance state consistent to what's actually in the index (Mike
4408 27. LUCENE-129: Change finalizers to do "try {...} finally
4409 {super.finalize();}" to make sure we don't miss finalizers in
4410 classes above us. (Esmond Pitt via Mike McCandless)
4412 28. LUCENE-754: Fix a problem introduced by LUCENE-651, causing
4413 IndexReaders to hang around forever, in addition to not
4414 fixing the original FieldCache performance problem.
4415 (Chris Hostetter, Yonik Seeley)
4417 29. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
4418 correctly raise ArrayIndexOutOfBoundsException when docNum is too
4419 large. Previously, if docNum was only slightly too large (within
4420 the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
4421 exception would be raised and instead the index would become
4422 silently corrupted. The corruption then only appears much later,
4423 in mergeSegments, when the corrupted segment is merged with
4424 segment(s) after it. (Mike McCandless)
4426 30. LUCENE-768: Fix case where an Exception during deleteDocument,
4427 undeleteAll or setNorm in IndexReader could leave the reader in a
4428 state where close() fails to release the write lock.
4431 31. Remove "tvp" from known index file extensions because it is
4432 never used. (Nicolas Lalevée via Bernhard Messer)
4434 32. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
4435 rely on file length check and instead use the SegmentInfo's
4436 docCount that's already stored explicitly in the index. This is a
4437 defensive bug fix (ie, there is no known problem seen "in real
4438 life" due to this, just a possible future problem). (Chuck
4439 Williams via Mike McCandless)
4443 1. LUCENE-586: TermDocs.skipTo() is now more efficient for
4444 multi-segment indexes. This will improve the performance of many
4445 types of queries against a non-optimized index. (Andrew Hudson
4448 2. LUCENE-623: RAMDirectory.close now nulls out its reference to all
4449 internal "files", allowing them to be GCed even if references to the
4450 RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
4452 3. LUCENE-629: Compressed fields are no longer uncompressed and
4453 recompressed during segment merges (e.g. during indexing or
4454 optimizing), thus improving performance . (Michael Busch via Otis
4457 4. LUCENE-388: Improve indexing performance when maxBufferedDocs is
4458 large by keeping a count of buffered documents rather than
4459 counting after each document addition. (Doron Cohen, Paul Smith,
4462 5. Modified TermScorer.explain to use TermDocs.skipTo() instead of
4463 looping through docs. (Grant Ingersoll)
4465 6. LUCENE-672: New indexing segment merge policy flushes all
4466 buffered docs to their own segment and delays a merge until
4467 mergeFactor segments of a certain level have been accumulated.
4468 This increases indexing performance in the presence of deleted
4469 docs or partially full segments as well as enabling future
4472 NOTE: this also fixes an "under-merging" bug whereby it is
4473 possible to get far too many segments in your index (which will
4474 drastically slow down search, risks exhausting file descriptor
4475 limit, etc.). This can happen when the number of buffered docs
4476 at close, plus the number of docs in the last non-ram segment is
4477 greater than mergeFactor. (Ning Li, Yonik Seeley)
4479 7. Lazy loaded fields unnecessarily retained an extra copy of loaded
4480 String data. (Yonik Seeley)
4482 8. LUCENE-443: ConjunctionScorer performance increase. Speed up
4483 any BooleanQuery with more than one mandatory clause.
4484 (Abdul Chaudhry, Paul Elschot via Yonik Seeley)
4486 9. LUCENE-365: DisjunctionSumScorer performance increase of
4487 ~30%. Speeds up queries with optional clauses. (Paul Elschot via
4490 10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
4491 size buffers, which will speed up merging and retrieving binary
4492 and compressed fields. (Nadav Har'El via Yonik Seeley)
4494 11. LUCENE-687: Lazy skipping on proximity file speeds up most
4495 queries involving term positions, including phrase queries.
4496 (Michael Busch via Yonik Seeley)
4498 12. LUCENE-714: Replaced 2 cases of manual for-loop array copying
4499 with calls to System.arraycopy instead, in DocumentWriter.java.
4500 (Nicolas Lalevee via Mike McCandless)
4502 13. LUCENE-729: Non-recursive skipTo and next implementation of
4503 TermDocs for a MultiReader. The old implementation could
4504 recurse up to the number of segments in the index. (Yonik Seeley)
4506 14. LUCENE-739: Improve segment merging performance by reusing
4507 the norm array across different fields and doing bulk writes
4508 of norms of segments with no deleted docs.
4509 (Michael Busch via Yonik Seeley)
4511 15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
4512 to the List of clauses and replaced the internal synchronized Vector
4513 with an unsynchronized List. (Yonik Seeley)
4515 16. LUCENE-750: Remove finalizers from FSIndexOutput and move the
4516 FSIndexInput finalizer to the actual file so all clones don't
4517 register a new finalizer. (Yonik Seeley)
4521 1. Added TestTermScorer.java (Grant Ingersoll)
4523 2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)
4525 3. LUCENE-744 Append the user.name property onto the temporary directory
4526 that is created so it doesn't interfere with other users. (Grant Ingersoll)
4530 1. Added style sheet to xdocs named lucene.css and included in the
4531 Anakia VSL descriptor. (Grant Ingersoll)
4533 2. Added scoring.xml document into xdocs. Updated Similarity.java
4534 scoring formula.(Grant Ingersoll and Steve Rowe. Updates from:
4535 Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
4538 3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
4540 4. Moved xdocs directory to src/site/src/documentation/content/xdocs per
4541 Issue 707. Site now builds using Forrest, just like the other Lucene
4542 siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
4543 for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
4544 Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
4546 5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll)
4548 6. LUCENE-713 Updated the Term Vector section of File Formats to include
4549 documentation on how Offset and Position info are stored in the TVF file.
4550 (Grant Ingersoll, Samir Abdou)
4552 7. Added in link to Clover Test Code Coverage Reports under the Develop
4553 section in Resources (Grant Ingersoll)
4555 8. LUCENE-748: Added details for semantics of IndexWriter.close on
4556 hitting an Exception. (Jed Wesley-Smith via Mike McCandless)
4558 9. Added some text about what is contained in releases.
4559 (Eric Haszlakiewicz via Grant Ingersoll)
4561 10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
4562 makes a full copy of the starting Directory. (Mike McCandless)
4564 11. LUCENE-764: Fix javadocs to detail temporary space requirements
4565 for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
4566 methods. (Mike McCandless)
4570 1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721
4571 To enable clover code coverage, you must have clover.jar in the ANT
4572 classpath and specify -Drun.clover=true on the command line.
4573 (Michael Busch and Grant Ingersoll)
4575 2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
4576 ${build.dir}/test just like the tempDir sysproperty.
4578 3. LUCENE-757 Added new target named init-dist that does setup for
4579 distribution of both binary and source distributions. Called by package
4582 ======================= Release 2.0.0 =======================
4586 1. All deprecated methods and fields have been removed, except
4587 DateField, which will still be supported for some time
4588 so Lucene can read its date fields from old indexes
4589 (Yonik Seeley & Grant Ingersoll)
4591 2. DisjunctionSumScorer is no longer public.
4592 (Paul Elschot via Otis Gospodnetic)
4594 3. Creating a Field with both an empty name and an empty value
4595 now throws an IllegalArgumentException
4598 4. LUCENE-301: Added new IndexWriter({String,File,Directory},
4599 Analyzer) constructors that do not take a boolean "create"
4600 argument. These new constructors will create a new index if
4601 necessary, else append to the existing one. (Dan Armbrust via
4606 1. LUCENE-496: Command line tool for modifying the field norms of an
4607 existing index; added to contrib/miscellaneous. (Chris Hostetter)
4609 2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.
4614 1. LUCENE-330: Fix issue of FilteredQuery not working properly within
4615 BooleanQuery. (Paul Elschot via Erik Hatcher)
4617 2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
4618 with RemoteSearchable. (Philippe Laflamme via Yonik Seeley)
4620 3. Added methods to get/set writeLockTimeout and commitLockTimeout in
4621 IndexWriter. These could be set in Lucene 1.4 using a system property.
4622 This feature had been removed without adding the corresponding
4623 getter/setter methods. (Daniel Naber)
4625 4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
4626 when using SpanQueries. (Paul Elschot via Yonik Seeley)
4628 5. Implemented FilterIndexReader.getVersion() and isCurrent()
4631 6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
4632 that sometimes caused the index order of documents to change.
4635 7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
4636 subsequent String sorts with different locales to sort identically.
4637 (Paul Cowan via Yonik Seeley)
4639 8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery
4640 (Stefan Will via Yonik Seeley)
4642 9. LUCENE-514: Added getTermArrays() and extractTerms() to
4643 MultiPhraseQuery (Eric Jain & Yonik Seeley)
4645 10. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors
4646 (frederic via Yonik)
4648 11. LUCENE-352: Fixed bug in SpanNotQuery that manifested as
4649 NullPointerException when "exclude" query was not a SpanTermQuery.
4652 12. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
4655 13. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
4656 didn't know about the field yet, reader didn't keep track if it had deletions,
4657 and deleteDocument calls could circumvent synchronization on the subreaders.
4658 (Chuck Williams via Yonik Seeley)
4660 14. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
4661 ConstantScoreQuery in order to allow their use with a MultiSearcher.
4664 15. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.
4665 (Peter Royal, Michael Chan, Yonik Seeley)
4667 16. LUCENE-485: Don't hold commit lock while removing obsolete index
4668 files. (Luc Vanlerberghe via cutting)
4675 1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization
4676 introduced in 1.9-final. (Shay Banon & Steven Tamm via cutting)
4680 Note that this release is mostly but not 100% source compatible with
4681 the previous release of Lucene (1.4.3). In other words, you should
4682 make sure your application compiles with this version of Lucene before
4683 you replace the old Lucene JAR with the new one. Many methods have
4684 been deprecated in anticipation of release 2.0, so deprecation
4685 warnings are to be expected when upgrading from 1.4.3 to 1.9.
4689 1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
4690 effects on indexing performance and has thus been reverted. The
4691 argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
4692 an exception is thrown. (Daniel Naber)
4696 1. Optimized BufferedIndexOutput.writeBytes() to use
4697 System.arraycopy() in more cases, rather than copying byte-by-byte.
4698 (Lukas Zapletal via Cutting)
4704 1. To compile and use Lucene you now need Java 1.4 or later.
4706 Changes in runtime behavior
4708 1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
4709 FuzzyQuery expands to more than BooleanQuery.maxClauseCount
4710 terms only the BooleanQuery.maxClauseCount most similar terms
4711 go into the rewritten query and thus the exception is avoided.
4714 2. Changed system property from "org.apache.lucene.lockdir" to
4715 "org.apache.lucene.lockDir", so that its casing follows the existing
4716 pattern used in other Lucene system properties. (Bernhard)
4718 3. The terms of RangeQueries and FuzzyQueries are now converted to
4719 lowercase by default (as it has been the case for PrefixQueries
4720 and WildcardQueries before). Use setLowercaseExpandedTerms(false)
4721 to disable that behavior but note that this also affects
4722 PrefixQueries and WildcardQueries. (Daniel Naber)
4724 4. Document frequency that is computed when MultiSearcher is used is now
4725 computed correctly and "globally" across subsearchers and indices, while
4726 before it used to be computed locally to each index, which caused
4727 ranking across multiple indices not to be equivalent.
4728 (Chuck Williams, Wolf Siberski via Otis, bug #31841)
4730 5. When opening an IndexWriter with create=true, Lucene now only deletes
4731 its own files from the index directory (looking at the file name suffixes
4732 to decide if a file belongs to Lucene). The old behavior was to delete
4733 all files. (Daniel Naber and Bernhard Messer, bug #34695)
4735 6. The version of an IndexReader, as returned by getCurrentVersion()
4736 and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
4737 is now initialized by the system time in milliseconds.
4738 (Bernhard Messer via Daniel Naber)
4740 7. Several default values cannot be set via system properties anymore, as
4741 this has been considered inappropriate for a library like Lucene. For
4742 most properties there are set/get methods available in IndexWriter which
4743 you should use instead. This affects the following properties:
4744 See IndexWriter for getter/setter methods:
4745 org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
4746 org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
4747 org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
4748 org.apache.lucene.mergeFactor,
4749 See BooleanQuery for getter/setter methods:
4750 org.apache.lucene.maxClauseCount
4751 See FSDirectory for getter/setter methods:
4755 8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
4756 instead of using Integer and Float classes for parsing.
4757 (Yonik Seeley via Otis Gospodnetic)
4759 9. Expert level search routines returning TopDocs and TopFieldDocs
4760 no longer normalize scores. This also fixes bugs related to
4761 MultiSearchers and score sorting/normalization.
4762 (Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
4766 1. Added support for stored compressed fields (patch #31149)
4767 (Bernhard Messer via Christoph)
4769 2. Added support for binary stored fields (patch #29370)
4770 (Drew Farris and Bernhard Messer via Christoph)
4772 3. Added support for position and offset information in term vectors
4773 (patch #18927). (Grant Ingersoll & Christoph)
4775 4. A new class DateTools has been added. It allows you to format dates
4776 in a readable format adequate for indexing. Unlike the existing
4777 DateField class DateTools can cope with dates before 1970 and it
4778 forces you to specify the desired date resolution (e.g. month, day,
4779 second, ...) which can make RangeQuerys on those fields more efficient.
4782 5. QueryParser now correctly works with Analyzers that can return more
4783 than one token per position. For example, a query "+fast +car"
4784 would be parsed as "+fast +(car automobile)" if the Analyzer
4785 returns "car" and "automobile" at the same position whenever it
4786 finds "car" (Patch #23307).
4787 (Pierrick Brihaye, Daniel Naber)
4789 6. Permit unbuffered Directory implementations (e.g., using mmap).
4790 InputStream is replaced by the new classes IndexInput and
4791 BufferedIndexInput. OutputStream is replaced by the new classes
4792 IndexOutput and BufferedIndexOutput. InputStream and OutputStream
4793 are now deprecated and FSDirectory is now subclassable. (cutting)
4795 7. Add native Directory and TermDocs implementations that work under
4796 GCJ. These require GCC 3.4.0 or later and have only been tested
4797 on Linux. Use 'ant gcj' to build demo applications. (cutting)
4799 8. Add MMapDirectory, which uses nio to mmap input files. This is
4800 still somewhat slower than FSDirectory. However it uses less
4801 memory per query term, since a new buffer is not allocated per
4802 term, which may help applications which use, e.g., wildcard
4803 queries. It may also someday be faster. (cutting & Paul Elschot)
4805 9. Added javadocs-internal to build.xml - bug #30360
4806 (Paul Elschot via Otis)
4808 10. Added RangeFilter, a more generically useful filter than DateFilter.
4809 (Chris M Hostetter via Erik)
4811 11. Added NumberTools, a utility class indexing numeric fields.
4812 (adapted from code contributed by Matt Quail; committed by Erik)
4814 12. Added public static IndexReader.main(String[] args) method.
4815 IndexReader can now be used directly at command line level
4816 to list and optionally extract the individual files from an existing
4817 compound index file.
4818 (adapted from code contributed by Garrett Rooney; committed by Bernhard)
4820 13. Add IndexWriter.setTermIndexInterval() method. See javadocs.
4823 14. Added LucenePackage, whose static get() method returns java.util.Package,
4824 which lets the caller get the Lucene version information specified in
4826 (Doug Cutting via Otis)
4828 15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
4829 This provides standard java.util.Iterator iteration over Hits.
4830 Each call to the iterator's next() method returns a Hit object.
4831 (Jeremy Rayner via Erik)
4833 16. Add ParallelReader, an IndexReader that combines separate indexes
4834 over different fields into a single virtual index. (Doug Cutting)
4836 17. Add IntParser and FloatParser interfaces to FieldCache, so that
4837 fields in arbitrarily formats can be cached as ints and floats.
4840 18. Added class org.apache.lucene.index.IndexModifier which combines
4841 IndexWriter and IndexReader, so you can add and delete documents without
4842 worrying about synchronization/locking issues.
4845 19. Lucene can now be used inside an unsigned applet, as Lucene's access
4846 to system properties will not cause a SecurityException anymore.
4847 (Jon Schuster via Daniel Naber, bug #34359)
4849 20. Added a new class MatchAllDocsQuery that matches all documents.
4850 (John Wang via Daniel Naber, bug #34946)
4852 21. Added ability to omit norms on a per field basis to decrease
4853 index size and memory consumption when there are many indexed fields.
4854 See Field.setOmitNorms()
4855 (Yonik Seeley, LUCENE-448)
4857 22. Added NullFragmenter to contrib/highlighter, which is useful for
4858 highlighting entire documents or fields.
4861 23. Added regular expression queries, RegexQuery and SpanRegexQuery.
4862 Note the same term enumeration caveats apply with these queries as
4863 apply to WildcardQuery and other term expanding queries.
4864 These two new queries are not currently supported via QueryParser.
4867 24. Added ConstantScoreQuery which wraps a filter and produces a score
4868 equal to the query boost for every matching document.
4869 (Yonik Seeley, LUCENE-383)
4871 25. Added ConstantScoreRangeQuery which produces a constant score for
4872 every document in the range. One advantage over a normal RangeQuery
4873 is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
4874 number of terms the range can cover. Both endpoints may also be open.
4875 (Yonik Seeley, LUCENE-383)
4877 26. Added ability to specify a minimum number of optional clauses that
4878 must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch().
4879 (Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
4881 27. Added DisjunctionMaxQuery which provides the maximum score across its clauses.
4882 It's very useful for searching across multiple fields.
4883 (Chuck Williams via Yonik Seeley, LUCENE-323)
4885 28. New class ISOLatin1AccentFilter that replaces accented characters in the ISO
4886 Latin 1 character set by their unaccented equivalent.
4887 (Sven Duzont via Erik Hatcher)
4889 29. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
4890 This is useful for data like zip codes, ids, and some product names.
4893 30. Copied LengthFilter from contrib area to core. Removes words that are too
4894 long and too short from the stream.
4895 (David Spencer via Otis and Daniel)
4897 31. Added getPositionIncrementGap(String fieldName) to Analyzer. This allows
4898 custom analyzers to put gaps between Field instances with the same field
4899 name, preventing phrase or span queries crossing these boundaries. The
4900 default implementation issues a gap of 0, allowing the default token
4901 position increment of 1 to put the next field's first token into a
4902 successive position.
4903 (Erik Hatcher, with advice from Yonik)
4905 32. StopFilter can now ignore case when checking for stop words.
4906 (Grant Ingersoll via Yonik, LUCENE-248)
4908 33. Add TopDocCollector and TopFieldDocCollector. These simplify the
4909 implementation of hit collectors that collect only the
4910 top-scoring or top-sorting hits.
4914 1. Several methods and fields have been deprecated. The API documentation
4915 contains information about the recommended replacements. It is planned
4916 that most of the deprecated methods and fields will be removed in
4917 Lucene 2.0. (Daniel Naber)
4919 2. The Russian and the German analyzers have been moved to contrib/analyzers.
4920 Also, the WordlistLoader class has been moved one level up in the
4921 hierarchy and is now org.apache.lucene.analysis.WordlistLoader
4924 3. The API contained methods that declared to throw an IOException
4925 but that never did this. These declarations have been removed. If
4926 your code tries to catch these exceptions you might need to remove
4927 those catch clauses to avoid compile errors. (Daniel Naber)
4929 4. Add a serializable Parameter Class to standardize parameter enum
4930 classes in BooleanClause and Field. (Christoph)
4932 5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
4933 This allows custom SpanQuery subclasses that rewrite (for term expansion, for
4934 example) to nest within the built-in SpanQuery classes successfully.
4938 1. The JSP demo page (src/jsp/results.jsp) now properly closes the
4939 IndexSearcher it opens. (Daniel Naber)
4941 2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
4942 prevented deletion of obsolete segments. (Christoph Goller)
4944 3. Fix in FieldInfos to avoid the return of an extra blank field in
4945 IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
4947 4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
4948 PhrasePrefixQuery) could provoke UnsupportedOperationException
4949 (bug #33161). (Rhett Sutphin via Daniel Naber)
4951 5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
4952 if skipTo() was called without prior call to next() fixed. (Christoph)
4954 6. Disable Similiarty.coord() in the scoring of most automatically
4955 generated boolean queries. The coord() score factor is
4956 appropriate when clauses are independently specified by a user,
4957 but is usually not appropriate when clauses are generated
4958 automatically, e.g., by a fuzzy, wildcard or range query. Matches
4959 on such automatically generated queries are no longer penalized
4960 for not matching all terms. (Doug Cutting, Patch #33472)
4962 7. Getting a lock file with Lock.obtain(long) was supposed to wait for
4963 a given amount of milliseconds, but this didn't work.
4964 (John Wang via Daniel Naber, Bug #33799)
4966 8. Fix FSDirectory.createOutput() to always create new files.
4967 Previously, existing files were overwritten, and an index could be
4968 corrupted when the old version of a file was longer than the new.
4969 Now any existing file is first removed. (Doug Cutting)
4971 9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
4972 could return an incorrect number of hits.
4973 (Reece Wilton via Erik Hatcher, Bug #35157)
4975 10. Fix NullPointerException that could occur with a MultiPhraseQuery
4976 inside a BooleanQuery.
4977 (Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)
4979 11. Fixed SnowballFilter to pass through the position increment from
4981 (Yonik Seeley via Erik Hatcher, LUCENE-437)
4983 12. Added Unicode range of Korean characters to StandardTokenizer,
4984 grouping contiguous characters into a token rather than one token
4985 per character. This change also changes the token type to "<CJ>"
4986 for Chinese and Japanese character tokens (previously it was "<CJK>").
4987 (Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
4989 13. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
4990 FieldInfo.storePositionWithTermVector and creates the Field with
4991 correct TermVector parameter.
4992 (Frank Steinmann via Bernhard, LUCENE-455)
4994 14. Fixed WildcardQuery to prevent "cat" matching "ca??".
4995 (Xiaozheng Ma via Bernhard, LUCENE-306)
4997 15. Fixed a bug where MultiSearcher and ParallelMultiSearcher could
4998 change the sort order when sorting by string for documents without
4999 a value for the sort field.
5000 (Luc Vanlerberghe via Yonik, LUCENE-453)
5002 16. Fixed a sorting problem with MultiSearchers that can lead to
5003 missing or duplicate docs due to equal docs sorting in an arbitrary order.
5004 (Yonik Seeley, LUCENE-456)
5006 17. A single hit using the expert level sorted search methods
5007 resulted in the score not being normalized.
5008 (Yonik Seeley, LUCENE-462)
5010 18. Fixed inefficient memory usage when loading an index into RAMDirectory.
5011 (Volodymyr Bychkoviak via Bernhard, LUCENE-475)
5013 19. Corrected term offsets returned by ChineseTokenizer.
5014 (Ray Tsang via Erik Hatcher, LUCENE-324)
5016 20. Fixed MultiReader.undeleteAll() to correctly update numDocs.
5017 (Robert Kirchgessner via Doug Cutting, LUCENE-479)
5019 21. Race condition in IndexReader.getCurrentVersion() and isCurrent()
5020 fixed by acquiring the commit lock.
5021 (Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
5023 22. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
5024 this has now been fixed. (Daniel Naber)
5026 23. Fixed QueryParser when called with a date in local form like
5027 "[1/16/2000 TO 1/18/2000]". This query did not include the documents
5028 of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
5030 24. Removed sorting constraint that threw an exception if there were
5031 not yet any values for the sort field (Yonik Seeley, LUCENE-374)
5035 1. Disk usage (peak requirements during indexing and optimization)
5036 in case of compound file format has been improved.
5037 (Bernhard, Dmitry, and Christoph)
5039 2. Optimize the performance of certain uses of BooleanScorer,
5040 TermScorer and IndexSearcher. In particular, a BooleanQuery
5041 composed of TermQuery, with not all terms required, that returns a
5042 TopDocs (e.g., through a Hits with no Sort specified) runs much
5045 3. Removed synchronization from reading of term vectors with an
5046 IndexReader (Patch #30736). (Bernhard Messer via Christoph)
5048 4. Optimize term-dictionary lookup to allocate far fewer terms when
5049 scanning for the matching term. This speeds searches involving
5050 low-frequency terms, where the cost of dictionary lookup can be
5051 significant. (cutting)
5053 5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
5054 of 0 now run 20-50% faster (Patch #31882).
5055 (Jonathan Hager via Daniel Naber)
5057 6. A Version of BooleanScorer (BooleanScorer2) added that delivers
5058 documents in increasing order and implements skipTo. For queries
5059 with required or forbidden clauses it may be faster than the old
5060 BooleanScorer, for BooleanQueries consisting only of optional
5061 clauses it is probably slower. The new BooleanScorer is now the
5062 default. (Patch 31785 by Paul Elschot via Christoph)
5064 7. Use uncached access to norms when merging to reduce RAM usage.
5065 (Bug #32847). (Doug Cutting)
5067 8. Don't read term index when random-access is not required. This
5068 reduces time to open IndexReaders and they use less memory when
5069 random access is not required, e.g., when merging segments. The
5070 term index is now read into memory lazily at the first
5071 random-access. (Doug Cutting)
5073 9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
5074 added indexes is larger than mergeFactor. Previously this could
5075 result in quadratic performance. Now performance is n log(n).
5078 10. Speed up the creation of TermEnum for indices with multiple
5079 segments and deleted documents, and thus speed up PrefixQuery,
5080 RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
5081 and sorting the first time on a field.
5082 (Yonik Seeley, LUCENE-454)
5084 11. Optimized and generalized 32 bit floating point to byte
5085 (custom 8 bit floating point) conversions. Increased the speed of
5086 Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
5087 (Yonik Seeley, LUCENE-467)
5091 1. Lucene's source code repository has converted from CVS to
5092 Subversion. The new repository is at
5093 http://svn.apache.org/repos/asf/lucene/java/trunk
5095 2. Lucene's issue tracker has migrated from Bugzilla to JIRA.
5096 Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
5097 The old issues are still available at
5098 http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx
5099 (use the bug number instead of xxxx)
5104 1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
5105 messages which might contain user input (e.g. error messages about
5106 query parsing). If you used that page as a starting point for your
5107 own code please make sure your code also properly escapes HTML
5108 characters from user input in order to avoid so-called cross site
5109 scripting attacks. (Daniel Naber)
5111 2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
5112 API is supported again. (Christoph)
5117 1. Fixed bug #31241: Sorting could lead to incorrect results (documents
5118 missing, others duplicated) if the sort keys were not unique and there
5119 were more than 100 matches. (Daniel Naber)
5121 2. Memory leak in Sort code (bug #31240) eliminated.
5122 (Rafal Krzewski via Christoph and Daniel)
5124 3. FuzzyQuery now takes an additional parameter that specifies the
5125 minimum similarity that is required for a term to match the query.
5126 The QueryParser syntax for this is term~x, where x is a floating
5127 point number >= 0 and < 1 (a bigger number means that a higher
5128 similarity is required). Furthermore, a prefix can be specified
5129 for FuzzyQuerys so that only those terms are considered similar that
5130 start with this prefix. This can speed up FuzzyQuery greatly.
5131 (Daniel Naber, Christoph Goller)
5133 4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
5134 of relative positions. (Christoph Goller)
5136 5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
5137 (patch #9110); some unused method parameters removed; The ability
5138 to specify a minimum similarity for FuzzyQuery has been added.
5141 6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
5142 for every non-zero-scoring hit. This makes 'OR' queries that
5143 contain common terms substantially faster. (cutting)
5148 1. Fixed a performance bug in hit sorting code, where values were not
5149 correctly cached. (Aviran via cutting)
5151 2. Fixed errors in file format documentation. (Daniel Naber)
5156 1. Added "an" to the list of stop words in StopAnalyzer, to complement
5157 the existing "a" there. Fix for bug 28960
5158 (http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
5160 2. Added new class FieldCache to manage in-memory caches of field term
5163 3. Added overloaded getFieldQuery method to QueryParser which
5164 accepts the slop factor specified for the phrase (or the default
5165 phrase slop for the QueryParser instance). This allows overriding
5166 methods to replace a PhraseQuery with a SpanNearQuery instead,
5167 keeping the proper slop factor. (Erik Hatcher)
5169 4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
5170 UTF-8 and changed the build encoding to UTF-8, to make changed files
5171 compile. (Otis Gospodnetic)
5173 5. Removed synchronization from term lookup under IndexReader methods
5174 termFreq(), termDocs() or termPositions() to improve
5175 multi-threaded performance. (cutting)
5177 6. Fix a bug where obsolete segment files were not deleted on Win32.
5182 1. Fixed several search bugs introduced by the skipTo() changes in
5183 release 1.4RC1. The index file format was changed a bit, so
5184 collections must be re-indexed to take advantage of the skipTo()
5185 optimizations. (Christoph Goller)
5187 2. Added new Document methods, removeField() and removeFields().
5190 3. Fixed inconsistencies with index closing. Indexes and directories
5191 are now only closed automatically by Lucene when Lucene opened
5192 them automatically. (Christoph Goller)
5194 4. Added new class: FilteredQuery. (Tim Jones)
5196 5. Added a new SortField type for custom comparators. (Tim Jones)
5198 6. Lock obtain timed out message now displays the full path to the lock
5199 file. (Daniel Naber via Erik)
5201 7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
5203 8. Fixed so that FSDirectory's locks still work when the
5204 java.io.tmpdir system property is null. (cutting)
5206 9. Changed FilteredTermEnum's constructor to take no parameters,
5207 as the parameters were ignored anyway (bug #28858)
5211 1. GermanAnalyzer now throws an exception if the stopword file
5212 cannot be found (bug #27987). It now uses LowerCaseFilter
5213 (bug #18410) (Daniel Naber via Otis, Erik)
5215 2. Fixed a few bugs in the file format documentation. (cutting)
5220 1. Changed the format of the .tis file, so that:
5222 - it has a format version number, which makes it easier to
5223 back-compatibly change file formats in the future.
5225 - the term count is now stored as a long. This was the one aspect
5226 of the Lucene's file formats which limited index size.
5228 - a few internal index parameters are now stored in the index, so
5229 that they can (in theory) now be changed from index to index,
5230 although there is not yet an API to do so.
5232 These changes are back compatible. The new code can read old
5233 indexes. But old code will not be able read new indexes. (cutting)
5235 2. Added an optimized implementation of TermDocs.skipTo(). A skip
5236 table is now stored for each term in the .frq file. This only
5237 adds a percent or two to overall index size, but can substantially
5238 speedup many searches. (cutting)
5240 3. Restructured the Scorer API and all Scorer implementations to take
5241 advantage of an optimized TermDocs.skipTo() implementation. In
5242 particular, PhraseQuerys and conjunctive BooleanQuerys are
5243 faster when one clause has substantially fewer matches than the
5244 others. (A conjunctive BooleanQuery is a BooleanQuery where all
5245 clauses are required.) (cutting)
5247 4. Added new class ParallelMultiSearcher. Combined with
5248 RemoteSearchable this makes it easy to implement distributed
5249 search systems. (Jean-Francois Halleux via cutting)
5251 5. Added support for hit sorting. Results may now be sorted by any
5252 indexed field. For details see the javadoc for
5253 Searcher#search(Query, Sort). (Tim Jones via Cutting)
5255 6. Changed FSDirectory to auto-create a full directory tree that it
5256 needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
5258 7. Added a new span-based query API. This implements, among other
5259 things, nested phrases. See javadocs for details. (Doug Cutting)
5261 8. Added new method Query.getSimilarity(Searcher), and changed
5262 scorers to use it. This permits one to subclass a Query class so
5263 that it can specify its own Similarity implementation, perhaps
5264 one that delegates through that of the Searcher. (Julien Nioche
5267 9. Added MultiReader, an IndexReader that combines multiple other
5268 IndexReaders. (Cutting)
5270 10. Added support for term vectors. See Field#isTermVectorStored().
5271 (Grant Ingersoll, Cutting & Dmitry)
5273 11. Fixed the old bug with escaping of special characters in query
5274 strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
5275 (Jean-Francois Halleux via Otis)
5277 12. Added support for overriding default values for the following,
5278 using system properties:
5279 - default commit lock timeout
5280 - default maxFieldLength
5281 - default maxMergeDocs
5282 - default mergeFactor
5283 - default minMergeDocs
5284 - default write lock timeout
5287 13. Changed QueryParser.jj to allow '-' and '+' within tokens:
5288 http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
5289 (Morus Walter via Otis)
5291 14. Changed so that the compound index format is used by default.
5292 This makes indexing a bit slower, but vastly reduces the chances
5293 of file handle problems. (Cutting)
5298 1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
5299 throw ParseException instead. (Erik Hatcher)
5301 2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
5303 3. Added a new method IndexReader.setNorm(), that permits one to
5304 alter the boosting of fields after an index is created.
5306 4. Distinguish between the final position and length when indexing a
5307 field. The length is now defined as the total number of tokens,
5308 instead of the final position, as it was previously. Length is
5309 used for score normalization (Similarity.lengthNorm()) and for
5310 controlling memory usage (IndexWriter.maxFieldLength). In both of
5311 these cases, the total number of tokens is a better value to use
5312 than the final token position. Position is used in phrase
5313 searching (see PhraseQuery and Token.setPositionIncrement()).
5315 5. Fix StandardTokenizer's handling of CJK characters (Chinese,
5316 Japanese and Korean ideograms). Previously contiguous sequences
5317 were combined in a single token, which is not very useful. Now
5318 each ideogram generates a separate token, which is more useful.
5323 1. Added minMergeDocs in IndexWriter. This can be raised to speed
5324 indexing without altering the number of files, but only using more
5325 memory. (Julien Nioche via Otis)
5327 2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
5329 3. Fix bug #16952, in demo HTML parser, skip comments in
5330 javascript. (Christoph Goller)
5332 4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
5333 output (Daniel Naber via Christoph Goller)
5335 5. Fix bug #24301, in demo HTML parser, long titles no longer
5336 hang things. (Christoph Goller)
5338 6. Fix bug #23534, Replace use of file timestamp of segments file
5339 with an index version number stored in the segments file. This
5340 resolves problems when running on file systems with low-resolution
5341 timestamps, e.g., HFS under MacOS X. (Christoph Goller)
5343 7. Fix QueryParser so that TokenMgrError is not thrown, only
5344 ParseException. (Erik Hatcher)
5346 8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
5348 9. Fixed a problem compiling TestRussianStem. (Christoph Goller)
5350 10. Cleaned up some build stuff. (Erik Hatcher)
5355 1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
5356 SegmentsReader. (Julien Nioche via otis)
5358 2. Changed file locking to place lock files in
5359 System.getProperty("java.io.tmpdir"), where all users are
5360 permitted to write files. This way folks can open and correctly
5361 lock indexes which are read-only to them.
5363 3. IndexWriter: added a new method, addDocument(Document, Analyzer),
5364 permitting one to easily use different analyzers for different
5365 documents in the same index.
5367 4. Minor enhancements to FuzzyTermEnum.
5368 (Christoph Goller via Otis)
5370 5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
5371 and MultiIndexSearcher to use it.
5372 (Christoph Goller via Otis)
5374 6. Fixed a bug in IndexWriter that returned incorrect docCount().
5375 (Christoph Goller via Otis)
5377 7. Fixed SegmentsReader to eliminate the confusing and slightly different
5378 behaviour of TermEnum when dealing with an enumeration of all terms,
5379 versus an enumeration starting from a specific term.
5380 This patch also fixes incorrect term document frequencies when the same term
5381 is present in multiple segments.
5382 (Christoph Goller via Otis)
5384 8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
5386 9. Added support for the new "compound file" index format (Dmitry
5389 10. Added Locale setting to QueryParser, for use by date range parsing.
5391 11. Changed IndexReader so that it can be subclassed by classes
5392 outside of its package. Previously it had package-private
5393 abstract methods. Also modified the index merging code so that it
5394 can work on an arbitrary IndexReader implementation, and added a
5395 new method, IndexWriter.addIndexes(IndexReader[]), to take
5396 advantage of this. (cutting)
5398 12. Added a limit to the number of clauses which may be added to a
5399 BooleanQuery. The default limit is 1024 clauses. This should
5400 stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
5401 queries which run amok. (cutting)
5403 13. Add new method: IndexReader.undeleteAll(). This undeletes all
5404 deleted documents which still remain in the index. (cutting)
5409 1. Fixed PriorityQueue's clear() method.
5410 Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
5411 (Matthijs Bomhoff via otis)
5413 2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
5414 Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
5415 (Dale Anson via otis)
5417 3. Added the ability to disable lock creation by using disableLuceneLocks
5418 system property. This is useful for read-only media, such as CD-ROMs.
5421 4. Added id method to Hits to be able to access the index global id.
5422 Required for sorting options.
5425 5. Added support for new range query syntax to QueryParser.jj.
5428 6. Added the ability to retrieve HTML documents' META tag values to
5430 (Mark Harwood via otis)
5432 7. Modified QueryParser to make it possible to programmatically specify the
5433 default Boolean operator (OR or AND).
5434 (Péter Halácsy via otis)
5436 8. Made many search methods and classes non-final, per requests.
5437 This includes IndexWriter and IndexSearcher, among others.
5440 9. Added class RemoteSearchable, providing support for remote
5441 searching via RMI. The test class RemoteSearchableTest.java
5442 provides an example of how this can be used. (cutting)
5444 10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
5445 test class TestPhrasePrefixQuery provides the usage example.
5446 (Anders Nielsen via otis)
5448 11. Changed the German stemming algorithm to ignore case while
5449 stripping. The new algorithm is faster and produces more equal
5450 stems from nouns and verbs derived from the same word.
5453 12. Added support for boosting the score of documents and fields via
5454 the new methods Document.setBoost(float) and Field.setBoost(float).
5456 Note: This changes the encoding of an indexed value. Indexes
5457 should be re-created from scratch in order for search scores to
5458 be correct. With the new code and an old index, searches will
5459 yield very large scores for shorter fields, and very small scores
5460 for longer fields. Once the index is re-created, scores will be
5461 as before. (cutting)
5463 13. Added new method Token.setPositionIncrement().
5465 This permits, for the purpose of phrase searching, placing
5466 multiple terms in a single position. This is useful with
5467 stemmers that produce multiple possible stems for a word.
5469 This also permits the introduction of gaps between terms, so that
5470 terms which are adjacent in a token stream will not be matched by
5471 and exact phrase query. This makes it possible, e.g., to build
5472 an analyzer where phrases are not matched over stop words which
5475 Finally, repeating a token with an increment of zero can also be
5476 used to boost scores of matches on that token. (cutting)
5478 14. Added new Filter class, QueryFilter. This constrains search
5479 results to only match those which also match a provided query.
5480 Results are cached, so that searches after the first on the same
5481 index using this filter are very fast.
5483 This could be used, for example, with a RangeQuery on a formatted
5484 date field to implement date filtering. One could re-use a
5485 single QueryFilter that matches, e.g., only documents modified
5486 within the last week. The QueryFilter and RangeQuery would only
5487 need to be reconstructed once per day. (cutting)
5489 15. Added a new IndexWriter method, getAnalyzer(). This returns the
5490 analyzer used when adding documents to this index. (cutting)
5492 16. Fixed a bug with IndexReader.lastModified(). Before, document
5493 deletion did not update this. Now it does. (cutting)
5495 17. Added Russian Analyzer.
5496 (Boris Okner via otis)
5498 18. Added a public, extensible scoring API. For details, see the
5499 javadoc for org.apache.lucene.search.Similarity.
5501 19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
5503 20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
5504 (Peter Mularien via otis)
5506 21. Added getFields(String) and getValues(String) methods.
5507 Contributed by Rasik Pandey on 2002-10-09
5508 (Rasik Pandey via otis)
5510 22. Revised internal search APIs. Changes include:
5512 a. Queries are no longer modified during a search. This makes
5513 it possible, e.g., to reuse the same query instance with
5514 multiple indexes from multiple threads.
5516 b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
5517 etc.) now work correctly with MultiSearcher, fixing bugs 12619
5520 c. Boosting BooleanQuery's now works, and is supported by the
5521 query parser (problem reported by Lee Mallabone). Thus a query
5522 like "(+foo +bar)^2 +baz" is now supported and equivalent to
5523 "(+foo^2 +bar^2) +baz".
5525 d. New method: Query.rewrite(IndexReader). This permits a
5526 query to re-write itself as an alternate, more primitive query.
5527 Most of the term-expanding query classes (PrefixQuery,
5528 WildcardQuery, etc.) are now implemented using this method.
5530 e. New method: Searchable.explain(Query q, int doc). This
5531 returns an Explanation instance that describes how a particular
5532 document is scored against a query. An explanation can be
5533 displayed as either plain text, with the toString() method, or
5534 as HTML, with the toHtml() method. Note that computing an
5535 explanation is as expensive as executing the query over the
5536 entire index. This is intended to be used in developing
5537 Similarity implementations, and, for good performance, should
5538 not be displayed with every hit.
5540 f. Scorer and Weight are public, not package protected. It now
5541 possible for someone to write a Scorer implementation that is
5542 not in the org.apache.lucene.search package. This is still
5543 fairly advanced programming, and I don't expect anyone to do
5544 this anytime soon, but at least now it is possible.
5546 g. Added public accessors to the primitive query classes
5547 (TermQuery, PhraseQuery and BooleanQuery), permitting access to
5548 their terms and clauses.
5550 Caution: These are extensive changes and they have not yet been
5551 tested extensively. Bug reports are appreciated.
5554 23. Added convenience RAMDirectory constructors taking File and String
5555 arguments, for easy FSDirectory to RAMDirectory conversion.
5558 24. Added code for manual renaming of files in FSDirectory, since it
5559 has been reported that java.io.File's renameTo(File) method sometimes
5560 fails on Windows JVMs.
5561 (Matt Tucker via otis)
5563 25. Refactored QueryParser to make it easier for people to extend it.
5564 Added the ability to automatically lower-case Wildcard terms in
5566 (Tatu Saloranta via otis)
5571 1. Changed QueryParser.jj to have "?" be a special character which
5572 allowed it to be used as a wildcard term. Updated TestWildcard
5573 unit test also. (Ralf Hettesheimer via carlson)
5577 1. Renamed build.properties to default.properties and updated
5578 the BUILD.txt document to describe how to override the
5579 default.property settings without having to edit the file. This
5580 brings the build process closer to Scarab's build process.
5583 2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
5585 3. Updated "powered by" links. (otis)
5587 4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
5589 5. Added throwing exception if FSDirectory could not create directory
5590 - Bug #6914 (Eugene Gluzberg via otis)
5592 6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
5593 LowerCaseTokenizer javadoc (otis)
5595 7. Added fix to avoid NullPointerException in results.jsp
5596 (Mark Hayes via otis)
5598 8. Changed Wildcard search to find 0 or more char instead of 1 or more
5599 (Lee Mallobone, via otis)
5601 9. Fixed error in offset issue in GermanStemFilter - Bug #7412
5602 (Rodrigo Reyes, via otis)
5604 10. Added unit tests for wildcard search and DateFilter (otis)
5606 11. Allow co-existence of indexed and non-indexed fields with the same name
5607 (cutting/casper, via otis)
5609 12. Add escape character to query parser.
5612 13. Applied a patch that ensures that searches that use DateFilter
5613 don't throw an exception when no matches are found. (David Smiley, via
5616 14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
5621 1. Updated contributions section of website.
5622 Add XML Document #3 implementation to Document Section.
5623 Also added Term Highlighting to Misc Section. (carlson)
5625 2. Fixed NullPointerException for phrase searches containing
5626 unindexed terms, introduced in 1.2RC3. (cutting)
5628 3. Changed document deletion code to obtain the index write lock,
5629 enforcing the fact that document addition and deletion cannot be
5630 performed concurrently. (cutting)
5632 4. Various documentation cleanups. (otis, acoliver)
5634 5. Updated "powered by" links. (cutting, jon)
5636 6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
5638 7. Changed Term and Query to implement Serializable. (scottganyo)
5640 8. Fixed to never delete indexes added with IndexWriter.addIndexes().
5643 9. Upgraded to JUnit 3.7. (otis)
5647 1. IndexWriter: fixed a bug where adding an optimized index to an
5648 empty index failed. This was encountered using addIndexes to copy
5649 a RAMDirectory index to an FSDirectory.
5651 2. RAMDirectory: fixed a bug where RAMInputStream could not read
5652 across more than across a single buffer boundary.
5654 3. Fix query parser so it accepts queries with unicode characters.
5657 4. Fix query parser so that PrefixQuery is used in preference to
5658 WildcardQuery when there's only an asterisk at the end of the
5659 term. Previously PrefixQuery would never be used.
5661 5. Fix tests so they compile; fix ant file so it compiles tests
5662 properly. Added test cases for Analyzers and PriorityQueue.
5664 6. Updated demos, added Getting Started documentation. (acoliver)
5666 7. Added 'contributions' section to website & docs. (carlson)
5668 8. Removed JavaCC from source distribution for copyright reasons.
5669 Folks must now download this separately from metamata in order to
5670 compile Lucene. (cutting)
5672 9. Substantially improved the performance of DateFilter by adding the
5673 ability to reuse TermDocs objects. (cutting)
5675 10. Added IndexReader methods:
5676 public static boolean indexExists(String directory);
5677 public static boolean indexExists(File directory);
5678 public static boolean indexExists(Directory directory);
5679 public static boolean isLocked(Directory directory);
5680 public static void unlock(Directory directory);
5683 11. Fixed bugs in GermanAnalyzer (gschwarz)
5687 - added sources to distribution
5688 - removed broken build scripts and libraries from distribution
5689 - SegmentsReader: fixed potential race condition
5690 - FSDirectory: fixed so that getDirectory(xxx,true) correctly
5691 erases the directory contents, even when the directory
5692 has already been accessed in this JVM.
5693 - RangeQuery: Fix issue where an inclusive range query would
5694 include the nearest term in the index above a non-existant
5695 specified upper term.
5696 - SegmentTermEnum: Fix NullPointerException in clone() method
5697 when the Term is null.
5698 - JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
5699 since they rely on a feature added in JDK 1.2.
5701 1.2 RC1 (first Apache release):
5702 - packages renamed from com.lucene to org.apache.lucene
5703 - license switched from LGPL to Apache
5704 - ant-only build -- no more makefiles
5705 - addition of lock files--now fully thread & process safe
5706 - addition of German stemmer
5707 - MultiSearcher now supports low-level search API
5708 - added RangeQuery, for term-range searching
5709 - Analyzers can choose tokenizer based on field name
5712 1.01b (last Sourceforge release)
5715 . new prefix query (search for "foo*" matches "food")
5719 This release fixes a few serious bugs and also includes some
5720 performance optimizations, a stemmer, and a few other minor
5725 Lucene now includes a grammar-based tokenizer, StandardTokenizer.
5727 The only tokenizer included in the previous release (LetterTokenizer)
5728 identified terms consisting entirely of alphabetic characters. The
5729 new tokenizer uses a regular-expression grammar to identify more
5730 complex classes of terms, including numbers, acronyms, email
5733 StandardTokenizer serves two purposes:
5735 1. It is a much better, general purpose tokenizer for use by
5738 The easiest way for applications to start using
5739 StandardTokenizer is to use StandardAnalyzer.
5741 2. It provides a good example of grammar-based tokenization.
5743 If an application has special tokenization requirements, it can
5744 implement a custom tokenizer by copying the directory containing
5745 the new tokenizer into the application and modifying it
5750 First open source release.
5752 The code has been re-organized into a new package and directory
5753 structure for this release. It builds OK, but has not been tested
5754 beyond that since the re-organization.