lucene-java-3.5.0/lucene/CHANGES.txt

   1 Lucene Change Log
   2
   3 For more information on past and future Lucene versions, please see:
   4 http://s.apache.org/luceneversions
   5
   6 ======================= Lucene 3.5.0 =======================
   7
   8 Changes in backwards compatibility policy
   9
  10 * LUCENE-3390: The first approach in Lucene 3.4.0 for missing values
  11   support for sorting had a design problem that made the missing value
  12   be populated directly into the FieldCache arrays during sorting,
  13   leading to concurrency issues. To fix this behaviour, the method
  14   signatures had to be changed:
  15   - FieldCache.getUnValuedDocs() was renamed to FieldCache.getDocsWithField()
  16     returning a Bits interface (backported from Lucene 4.0).
  17   - FieldComparator.setMissingValue() was removed and added to
  18     constructor
  19   As this is expert API, most code will not be affected.
  20   (Uwe Schindler, Doron Cohen, Mike McCandless)
  21
  22 * LUCENE-3464: IndexReader.reopen has been renamed to
  23   IndexReader.openIfChanged (a static method), and now returns null
  24   (instead of the old reader) if there are no changes in the index, to
  25   prevent the common pitfall of accidentally closing the old reader.
  26
  27 * LUCENE-3541: Remove IndexInput's protected copyBuf. If you want to
  28   keep a buffer in your IndexInput, do this yourself in your implementation,
  29   and be sure to do the right thing on clone()!  (Robert Muir)
  30
  31 * LUCENE-2822: TimeLimitingCollector now expects a counter clock instead of
  32   relying on a private daemon thread. The global time limiting clock thread
  33   has been exposed and is now lazily loaded and fully optional.
  34   TimeLimitingCollector now supports setting clock baseline manually to include
  35   prelude of a search. Previous versions set the baseline on construction time,
  36   now baseline is set once the first IndexReader is passed to the collector
  37   unless set before. (Simon Willnauer)
  38
  39 Changes in runtime behavior
  40
  41 * LUCENE-3520: IndexReader.openIfChanged, when passed a near-real-time
  42   reader, will now return null if there are no changes.  The API has
  43   always reserved the right to do this; it's just that in the past for
  44   near-real-time readers it never did. (Mike McCandless)
  45
  46 Bug fixes
  47
  48 * SOLR-2762: (backport form 4.x line): FSTLookup could return duplicate
  49   results or one results less than requested. (David Smiley, Dawid Weiss)
  50
  51 * LUCENE-3412: SloppyPhraseScorer was returning non-deterministic results
  52   for queries with many repeats (Doron Cohen)
  53
  54 * LUCENE-3421: PayloadTermQuery's explain was wrong when includeSpanScore=false.
  55   (Edward Drapkin via Robert Muir)
  56
  57 * LUCENE-3432: IndexWriter.expungeDeletes with TieredMergePolicy
  58   should ignore the maxMergedSegmentMB setting (v.sevel via Mike
  59   McCandless)
  60
  61 * LUCENE-3442: TermQuery.TermWeight.scorer() returns null for non-atomic
  62   IndexReaders (optimization bug, introcuced by LUCENE-2829), preventing
  63   QueryWrapperFilter and similar classes to get a top-level DocIdSet.
  64   (Dan C., Uwe Schindler)
  65
  66 * LUCENE-3390: Corrected handling of missing values when two parallel searches
  67   using different missing values for sorting: the missing value was populated
  68   directly into the FieldCache arrays during sorting, leading to concurrency
  69   issues.  (Uwe Schindler, Doron Cohen, Mike McCandless)
  70
  71 * LUCENE-3439: Closing an NRT reader after the writer was closed was
  72   incorrectly invoking the DeletionPolicy and (then possibly deleting
  73   files) on the closed IndexWriter (Robert Muir, Mike McCandless)
  74
  75 * LUCENE-3215: SloppyPhraseScorer sometimes computed Infinite freq
  76   (Robert Muir, Doron Cohen)
  77
  78 * LUCENE-3465: IndexSearcher with ExecutorService was always passing 0
  79   for docBase to Collector.setNextReader.  (Robert Muir, Mike
  80   McCandless)
  81
  82 * LUCENE-3503: DisjunctionSumScorer would give slightly different scores
  83   for a document depending if you used nextDoc() versus advance().
  84   (Mike McCandless, Robert Muir)
  85
  86 * LUCENE-3529: Properly support indexing an empty field with empty term text.
  87   Previously, if you had assertions enabled you would receive an error during
  88   flush, if you didn't, you would get an invalid index.
  89   (Mike McCandless, Robert Muir)
  90
  91 * LUCENE-2633: PackedInts Packed32 and Packed64 did not support internal
  92   structures larger than 256MB (Toke Eskildsen via Mike McCandless)
  93
  94 * LUCENE-3540: LUCENE-3255 dropped support for pre-1.9 indexes, but the
  95   error message in IndexFormatTooOldException was incorrect. (Uwe Schindler,
  96   Mike McCandless)
  97
  98 * LUCENE-3541: IndexInput's default copyBytes() implementation was not safe
  99   across multiple threads, because all clones shared the same buffer.
 100   (Robert Muir)
 101
 102 * LUCENE-3548: Fix CharsRef#append to extend length of the existing char[]
 103   and preserve existing chars. (Simon Willnauer)
 104
 105 * LUCENE-3582: Normalize NaN values in NumericUtils.floatToSortableInt() /
 106   NumericUtils.doubleToSortableLong(), so this is consistent with stored
 107   fields. Also fix NumericRangeQuery to not falsely hit NaNs on half-open
 108   ranges (one bound is null). Because of normalization, NumericRangeQuery
 109   can now be used to hit NaN values by creating a query with
 110   upper == lower == NaN (inclusive).  (Dawid Weiss, Uwe Schindler)
 111
 112 API Changes
 113
 114 * LUCENE-3454: Rename IndexWriter.optimize to forceMerge to discourage
 115   use of this method since it is horribly costly and rarely justified
 116   anymore.  MergePolicy.findMergesForOptimize was renamed to
 117   findForcedMerges.  IndexReader.isOptimized was
 118   deprecated. IndexCommit.isOptimized was replaced with
 119   getSegmentCount. (Robert Muir, Mike McCandless)
 120
 121 * LUCENE-3205: Deprecated MultiTermQuery.getTotalNumerOfTerms() [and
 122   related methods], as the numbers returned are not useful
 123   for multi-segment indexes. They were only needed for tests of
 124   NumericRangeQuery.  (Mike McCandless, Uwe Schindler)
 125
 126 * LUCENE-3574: Deprecate outdated constants in org.apache.lucene.util.Constants
 127   and add new ones for Java 6 and Java 7.  (Uwe Schindler)
 128
 129 * LUCENE-3571: Deprecate IndexSearcher(Directory). Use the constructors
 130   that take IndexReader instead.  (Robert Muir)
 131
 132 * LUCENE-3577: Rename IndexWriter.expungeDeletes to forceMergeDeletes,
 133   and revamped the javadocs, to discourage
 134   use of this method since it is horribly costly and rarely
 135   justified.  MergePolicy.findMergesToExpungeDeletes was renamed to
 136   findForcedDeletesMerges. (Robert Muir, Mike McCandless)
 137
 138 New Features
 139
 140 * LUCENE-3448: Added FixedBitSet.and(other/DISI), andNot(other/DISI).
 141   (Uwe Schindler)
 142
 143 * LUCENE-2215: Added IndexSearcher.searchAfter which returns results after a
 144   specified ScoreDoc (e.g. last document on the previous page) to support deep
 145   paging use cases.  (Aaron McCurry, Grant Ingersoll, Robert Muir)
 146
 147 * LUCENE-1990: Adds internal packed ints implementation, to be used
 148   for more efficient storage of int arrays when the values are
 149   bounded, for example for storing the terms dict index (Toke
 150   Eskildsen via Mike McCandless)
 151
 152 * LUCENE-3558: Moved SearcherManager, NRTManager & SearcherLifetimeManager into
 153   core. All classes are contained in o.a.l.search. (Simon Willnauer)
 154
 155 Optimizations
 156
 157 * LUCENE-3426: Add NGramPhraseQuery which extends PhraseQuery and tries to
 158   reduce the number of terms of the query when rewrite(), in order to improve
 159   performance.  (Robert Muir, Koji Sekiguchi)
 160
 161 * LUCENE-3494: Optimize FilteredQuery to remove a multiply in score()
 162   (Uwe Schindler, Robert Muir)
 163
 164 * LUCENE-3534: Remove filter logic from IndexSearcher and delegate to
 165   FilteredQuery's Scorer. This is a partial backport of a cleanup in
 166   FilteredQuery/IndexSearcher added by LUCENE-1536 to Lucene 4.0.
 167   (Uwe Schindler)
 168
 169 * LUCENE-2205: Very substantial (3-5X) RAM reduction required to hold
 170   the terms index on opening an IndexReader (Aaron McCurry via Mike McCandless)
 171
 172 * LUCENE-3443: FieldCache can now set docsWithField, and create an
 173   array, in a single pass.  This results in faster init time for apps
 174   that need both (such as sorting by a field with a missing value).
 175   (Mike McCandless)
 176
 177 Test Cases
 178
 179 * LUCENE-3420: Disable the finalness checks in TokenStream and Analyzer
 180   for implementing subclasses in different packages, where assertions are not
 181   enabled. (Uwe Schindler)
 182
 183 * LUCENE-3506: tests relying on assertions being enabled were no-op because
 184   they ignored AssertionError. With this fix now entire test framework
 185   (every test) fails if assertions are disabled, unless
 186   -Dtests.asserts.gracious=true is specified. (Doron Cohen)
 187
 188 Build
 189
 190 * SOLR-2849: Fix dependencies in Maven POMs. (David Smiley via Steve Rowe)
 191
 192 * LUCENE-3561: Fix maven xxx-src.jar files that were missing resources.
 193   (Uwe Schindler)
 194
 195 ======================= Lucene 3.4.0 =======================
 196
 197 Bug fixes
 198
 199 * LUCENE-3251: Directory#copy failed to close target output if opening the
 200   source stream failed. (Simon Willnauer)
 201
 202 * LUCENE-3255: If segments_N file is all zeros (due to file
 203   corruption), don't read that to mean the index is empty.  (Gregory
 204   Tarr, Mark Harwood, Simon Willnauer, Mike McCandless)
 205
 206 * LUCENE-3254: Fixed minor bug in deletes were written to disk,
 207   causing the file to sometimes be larger than it needed to be.  (Mike
 208   McCandless)
 209
 210 * LUCENE-3224: Fixed a big where CheckIndex would incorrectly report a
 211   corrupt index if a term with docfreq >= 16 was indexed more than once
 212   at the same position.  (Robert Muir)
 213
 214 * LUCENE-3339: Fixed deadlock case when multiple threads use the new
 215   block-add (IndexWriter.add/updateDocuments) methods.  (Robert Muir,
 216   Mike McCandless)
 217
 218 * LUCENE-3340: Fixed case where IndexWriter was not flushing at
 219   exactly maxBufferedDeleteTerms (Mike McCandless)
 220
 221 * LUCENE-3358, LUCENE-3361: StandardTokenizer and UAX29URLEmailTokenizer
 222   wrongly discarded combining marks attached to Han or Hiragana characters,
 223   this is fixed if you supply Version >= 3.4 If you supply a previous
 224   lucene version, you get the old buggy behavior for backwards compatibility.
 225   (Trejkaz, Robert Muir)
 226
 227 * LUCENE-3368: IndexWriter commits segments without applying their buffered
 228   deletes when flushing concurrently. (Simon Willnauer, Mike McCandless)
 229
 230 * LUCENE-3365: Create or Append mode determined before obtaining write lock
 231   can cause IndexWriter overriding an existing index.
 232   (Geoff Cooney via Simon Willnauer)
 233
 234 * LUCENE-3380: Fixed a bug where FileSwitchDirectory's listAll() would wrongly
 235   throw NoSuchDirectoryException when all files written so far have been
 236   written to one directory, but the other still has not yet been created on the
 237   filesystem.  (Robert Muir)
 238
 239 * LUCENE-3402: term vectors disappeared from the index if optimize() was called
 240   following addIndexes(). (Shai Erera)
 241
 242 * LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT
 243   SegmentReaders, leading to unused files accumulating in the
 244   Directory.  (tal steier via Mike McCandless)
 245
 246 * LUCENE-3390: Added SortField.setMissingValue(v) to enable well defined
 247   sorting behavior for documents that do not include the given field.
 248   (Gilad Barkai via Doron Cohen)
 249
 250 * LUCENE-3418: Lucene was failing to fsync index files on commit,
 251   meaning an operating system or hardware crash, or power loss, could
 252   easily corrupt the index.  (Mark Miller, Robert Muir, Mike
 253   McCandless)
 254
 255 New Features
 256
 257 * LUCENE-3290: Added FieldInvertState.numUniqueTerms
 258   (Mike McCandless, Robert Muir)
 259
 260 * LUCENE-3280: Add FixedBitSet, like OpenBitSet but is not elastic
 261   (grow on demand if you set/get/clear too-large indices).  (Mike
 262   McCandless)
 263
 264 * LUCENE-2048: Added the ability to omit positions but still index
 265   term frequencies, you can now control what is indexed into
 266   the postings via AbstractField.setIndexOptions:
 267    DOCS_ONLY: only documents are indexed: term frequencies and positions are omitted
 268    DOCS_AND_FREQS: only documents and term frequencies are indexed: positions are omitted
 269    DOCS_AND_FREQS_AND_POSITIONS: full postings: documents, frequencies, and positions
 270   AbstractField.setOmitTermFrequenciesAndPositions is deprecated,
 271   you should use DOCS_ONLY instead.  (Robert Muir)
 272
 273 * LUCENE-3097: Added a new grouping collector that can be used to retrieve all most relevant
 274   documents per group. This can be useful in situations when one wants to compute grouping
 275   based facets / statistics on the complete query result. (Martijn van Groningen)
 276
 277 * LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
 278   suppressed exceptions in the original exception, so stack trace
 279   will contain them.  (Uwe Schindler)
 280
 281 Optimizations
 282
 283 * LUCENE-3289: When building an FST you can now tune how aggressively
 284   the FST should try to share common suffixes.  Typically you can
 285   greatly reduce RAM required during building, and CPU consumed, at
 286   the cost of a somewhat larger FST.  (Mike McCandless)
 287
 288 Test Cases
 289
 290 * LUCENE-3327: Fix AIOOBE when TestFSTs is run with
 291   -Dtests.verbose=true (James Dyer via Mike McCandless)
 292
 293 Build
 294
 295 * LUCENE-3406: Add ant target 'package-local-src-tgz' to Lucene and Solr
 296   to package sources from the local working copy.
 297   (Seung-Yeoul Yang via Steve Rowe)
 298
 299
 300 ======================= Lucene 3.3.0 =======================
 301
 302 Changes in backwards compatibility policy
 303
 304 * LUCENE-3140: IndexOutput.copyBytes now takes a DataInput (superclass
 305   of IndexInput) as its first argument.  (Robert Muir, Dawid Weiss,
 306   Mike McCandless)
 307
 308 * LUCENE-3191: FieldComparator.value now returns an Object not
 309   Comparable; FieldDoc.fields also changed from Comparable[] to
 310   Object[] (Uwe Schindler, Mike McCandless)
 311
 312 * LUCENE-3208: Made deprecated methods Query.weight(Searcher) and
 313   Searcher.createWeight() final to prevent override. If you have
 314   overridden one of these methods, cut over to the non-deprecated
 315   implementation. (Uwe Schindler, Robert Muir, Yonik Seeley)
 316
 317 * LUCENE-3238: Made MultiTermQuery.rewrite() final, to prevent
 318   problems (such as not properly setting rewrite methods, or
 319   not working correctly with things like SpanMultiTermQueryWrapper).
 320   To rewrite to a simpler form, instead return a simpler enum
 321   from getEnum(IndexReader). For example, to rewrite to a single term,
 322   return a SingleTermEnum.  (ludovic Boutros, Uwe Schindler, Robert Muir)
 323
 324 Changes in runtime behavior
 325
 326 * LUCENE-2834: the hash used to compute the lock file name when the
 327   lock file is not stored in the index has changed.  This means you
 328   will see a different lucene-XXX-write.lock in your lock directory.
 329   (Robert Muir, Uwe Schindler, Mike McCandless)
 330
 331 * LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
 332   does not store norms. (Shai Erera, Mike McCandless)
 333
 334 * LUCENE-3198: On Linux, if the JRE is 64 bit and supports unmapping,
 335   FSDirectory.open now defaults to MMapDirectory instead of
 336   NIOFSDirectory since MMapDirectory gives better performance.  (Mike
 337   McCandless)
 338
 339 * LUCENE-3200: MMapDirectory now uses chunk sizes that are powers of 2.
 340   When setting the chunk size, it is rounded down to the next possible
 341   value. The new default value for 64 bit platforms is 2^30 (1 GiB),
 342   for 32 bit platforms it stays unchanged at 2^28 (256 MiB).
 343   Internally, MMapDirectory now only uses one dedicated final IndexInput
 344   implementation supporting multiple chunks, which makes Hotspot's life
 345   easier.  (Uwe Schindler, Robert Muir, Mike McCandless)
 346
 347 Bug fixes
 348
 349 * LUCENE-3147,LUCENE-3152: Fixed open file handles leaks in many places in the
 350   code. Now MockDirectoryWrapper (in test-framework) tracks all open files,
 351   including locks, and fails if the test fails to release all of them.
 352   (Mike McCandless, Robert Muir, Shai Erera, Simon Willnauer)
 353
 354 * LUCENE-3102: CachingCollector.replay was failing to call setScorer
 355   per-segment (Martijn van Groningen via Mike McCandless)
 356
 357 * LUCENE-3183: Fix rare corner case where seeking to empty term
 358   (field="", term="") with terms index interval 1 could hit
 359   ArrayIndexOutOfBoundsException (selckin, Robert Muir, Mike
 360   McCandless)
 361
 362 * LUCENE-3208: IndexSearcher had its own private similarity field
 363   and corresponding get/setter overriding Searcher's implementation. If you
 364   setted a different Similarity instance on IndexSearcher, methods implemented
 365   in the superclass Searcher were not using it, leading to strange bugs.
 366   (Uwe Schindler, Robert Muir)
 367
 368 * LUCENE-3197: Fix core merge policies to not over-merge during
 369   background optimize when documents are still being deleted
 370   concurrently with the optimize (Mike McCandless)
 371
 372 * LUCENE-3222: The RAM accounting for buffered delete terms was
 373   failing to measure the space required to hold the term's field and
 374   text character data.  (Mike McCandless)
 375
 376 * LUCENE-3238: Fixed bug where using WildcardQuery("prefix*") inside
 377   of a SpanMultiTermQueryWrapper rewrote incorrectly and returned
 378   an error instead.  (ludovic Boutros, Uwe Schindler, Robert Muir)
 379
 380 API Changes
 381
 382 * LUCENE-3208: Renamed protected IndexSearcher.createWeight() to expert
 383   public method IndexSearcher.createNormalizedWeight() as this better describes
 384   what this method does. The old method is still there for backwards
 385   compatibility. Query.weight() was deprecated and simply delegates to
 386   IndexSearcher. Both deprecated methods will be removed in Lucene 4.0.
 387   (Uwe Schindler, Robert Muir, Yonik Seeley)
 388
 389 * LUCENE-3197: MergePolicy.findMergesForOptimize now takes
 390   Map<SegmentInfo,Boolean> instead of Set<SegmentInfo> as the second
 391   argument, so the merge policy knows which segments were originally
 392   present vs produced by an optimizing merge (Mike McCandless)
 393
 394 Optimizations
 395
 396 * LUCENE-1736: DateTools.java general improvements.
 397   (David Smiley via Steve Rowe)
 398
 399 New Features
 400
 401 * LUCENE-3140: Added experimental FST implementation to Lucene.
 402   (Robert Muir, Dawid Weiss, Mike McCandless)
 403
 404 * LUCENE-3193: A new TwoPhaseCommitTool allows running a 2-phase commit
 405   algorithm over objects that implement the new TwoPhaseCommit interface (such
 406   as IndexWriter). (Shai Erera)
 407
 408 * LUCENE-3191: Added TopDocs.merge, to facilitate merging results from
 409   different shards (Uwe Schindler, Mike McCandless)
 410
 411 * LUCENE-3179: Added OpenBitSet.prevSetBit (Paul Elschot via Mike McCandless)
 412
 413 * LUCENE-3210: Made TieredMergePolicy more aggressive in reclaiming
 414   segments with deletions; added new methods
 415   set/getReclaimDeletesWeight to control this.  (Mike McCandless)
 416
 417 Build
 418
 419 * LUCENE-1344: Create OSGi bundle using dev-tools/maven.
 420   (Nicolas Lalevée, Luca Stancapiano via ryan)
 421
 422 * LUCENE-3204: The maven-ant-tasks jar is now included in the source tree;
 423   users of the generate-maven-artifacts target no longer have to manually
 424   place this jar in the Ant classpath.  NOTE: when Ant looks for the
 425   maven-ant-tasks jar, it looks first in its pre-existing classpath, so
 426   any copies it finds will be used instead of the copy included in the
 427   Lucene/Solr source tree.  For this reason, it is recommeded to remove
 428   any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under
 429   ~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe)
 430
 431
 432 ======================= Lucene 3.2.0 =======================
 433
 434 Changes in backwards compatibility policy
 435
 436 * LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing
 437   with generics can lead to ClassCastException. For advanced use (e.g. in Solr)
 438   a method getHeapArray() was added to retrieve the internal heap array as a
 439   non-generic Object[].  (Uwe Schindler, Yonik Seeley)
 440
 441 * LUCENE-1076: IndexWriter.setInfoStream now throws IOException
 442   (Mike McCandless, Shai Erera)
 443
 444 * LUCENE-3084: MergePolicy.OneMerge.segments was changed from
 445   SegmentInfos to a List<SegmentInfo>. SegmentInfos itsself was changed
 446   to no longer extend Vector<SegmentInfo> (to update code that is using
 447   Vector-API, use the new asList() and asSet() methods returning unmodifiable
 448   collections; modifying SegmentInfos is now only possible through
 449   the explicitely declared methods). IndexWriter.segString() now takes
 450   Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile
 451   should fix this. MergePolicy and SegmentInfos are internal/experimental
 452   APIs not covered by the strict backwards compatibility policy.
 453   (Uwe Schindler, Mike McCandless)
 454
 455 Changes in runtime behavior
 456
 457 * LUCENE-3065: When a NumericField is retrieved from a Document loaded
 458   from IndexReader (or IndexSearcher), it will now come back as
 459   NumericField not as a Field with a string-ified version of the
 460   numeric value you had indexed.  Note that this only applies for
 461   newly-indexed Documents; older indices will still return Field
 462   with the string-ified numeric value. If you call Document.get(),
 463   the value comes still back as String, but Document.getFieldable()
 464   returns NumericField instances. (Uwe Schindler, Ryan McKinley,
 465   Mike McCandless)
 466
 467 * LUCENE-1076: Changed the default merge policy from
 468   LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32
 469   (passed to IndexWriterConfig), which is able to merge non-contiguous
 470   segments. This means docIDs no longer necessarily stay "in order"
 471   during indexing.  If this is a problem then you can use either of
 472   the LogMergePolicy impls.  (Mike McCandless)
 473
 474 New features
 475
 476 * LUCENE-3082: Added index upgrade tool oal.index.IndexUpgrader
 477   that allows to upgrade all segments to last recent supported index
 478   format without fully optimizing.  (Uwe Schindler, Mike McCandless)
 479
 480 * LUCENE-1076: Added TieredMergePolicy which is able to merge non-contiguous
 481   segments, which means docIDs no longer necessarily stay "in order".
 482   (Mike McCandless, Shai Erera)
 483
 484 * LUCENE-3071: Adding ReversePathHierarchyTokenizer, added skip parameter to
 485   PathHierarchyTokenizer (Olivier Favre via ryan)
 486
 487 * LUCENE-1421, LUCENE-3102: added CachingCollector which allow you to cache
 488   document IDs and scores encountered during the search, and "replay" them to
 489   another Collector. (Mike McCandless, Shai Erera)
 490
 491 * LUCENE-3112: Added experimental IndexWriter.add/updateDocuments,
 492   enabling a block of documents to be indexed, atomically, with
 493   guaranteed sequential docIDs.  (Mike McCandless)
 494
 495 API Changes
 496
 497 * LUCENE-3061: IndexWriter's getNextMerge() and merge(OneMerge) are now public
 498   (though @lucene.experimental), allowing for custom MergeScheduler
 499   implementations. (Shai Erera)
 500
 501 * LUCENE-3065: Document.getField() was deprecated, as it throws
 502   ClassCastException when loading lazy fields or NumericFields.
 503   (Uwe Schindler, Ryan McKinley, Mike McCandless)
 504
 505 * LUCENE-2027: Directory.touchFile is deprecated and will be removed
 506   in 4.0.  (Mike McCandless)
 507
 508 Optimizations
 509
 510 * LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early
 511   on empty or one-element lists/arrays.  (Uwe Schindler)
 512
 513 * LUCENE-2897: Apply deleted terms while flushing a segment.  We still
 514   buffer deleted terms to later apply to past segments.  (Mike McCandless)
 515
 516 * LUCENE-3126: IndexWriter.addIndexes copies incoming segments into CFS if they
 517   aren't already and MergePolicy allows that. (Shai Erera)
 518
 519 Bug fixes
 520
 521 * LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new
 522   indexes, causing existing deletions to be applied on the incoming indexes as
 523   well. (Shai Erera, Mike McCandless)
 524
 525 * LUCENE-3024: Index with more than 2.1B terms was hitting AIOOBE when
 526   seeking TermEnum (eg used by Solr's faceting) (Tom Burton-West, Mike
 527   McCandless)
 528
 529 * LUCENE-3042: When a filter or consumer added Attributes to a TokenStream
 530   chain after it was already (partly) consumed [or clearAttributes(),
 531   captureState(), cloneAttributes(),... was called by the Tokenizer],
 532   the Tokenizer calling clearAttributes() or capturing state after addition
 533   may not do this on the newly added Attribute. This bug affected only
 534   very special use cases of the TokenStream-API, most users would not
 535   have recognized it.  (Uwe Schindler, Robert Muir)
 536
 537 * LUCENE-3054: PhraseQuery can in some cases stack overflow in
 538   SorterTemplate.quickSort(). This fix also adds an optimization to
 539   PhraseQuery as term with lower doc freq will also have less positions.
 540   (Uwe Schindler, Robert Muir, Otis Gospodnetic)
 541
 542 * LUCENE-3068: sloppy phrase query failed to match valid documents when multiple
 543   query terms had same position in the query. (Doron Cohen)
 544
 545 * LUCENE-3012: Lucene writes the header now for separate norm files (*.sNNN)
 546   (Robert Muir)
 547
 548 Build
 549
 550 * LUCENE-3006: Building javadocs will fail on warnings by default.
 551   Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
 552
 553 * LUCENE-3128: "ant eclipse" creates a .project file for easier Eclipse
 554   integration (unless one already exists). (Daniel Serodio via Shai Erera)
 555
 556 Test Cases
 557
 558 * LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to
 559   stop iterating if at least 'tests.iter.min' ran and a failure occured.
 560   (Shai Erera, Chris Hostetter)
 561
 562 ======================= Lucene 3.1.0 =======================
 563
 564 Changes in backwards compatibility policy
 565
 566 * LUCENE-2719: Changed API of internal utility class
 567   org.apache.lucene.util.SorterTemplate to support faster quickSort using
 568   pivot values and also merge sort and insertion sort. If you have used
 569   this class, you have to implement two more methods for handling pivots.
 570   (Uwe Schindler, Robert Muir, Mike McCandless)
 571
 572 * LUCENE-1923: Renamed SegmentInfo & SegmentInfos segString method to
 573   toString.  These are advanced APIs and subject to change suddenly.
 574   (Tim Smith via Mike McCandless)
 575
 576 * LUCENE-2190: Removed deprecated customScore() and customExplain()
 577   methods from experimental CustomScoreQuery.  (Uwe Schindler)
 578
 579 * LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
 580   This means that terms with a position increment gap of zero do not
 581   affect the norms calculation by default.  (Robert Muir)
 582
 583 * LUCENE-2320: MergePolicy.writer is now of type SetOnce, which allows setting
 584   the IndexWriter for a MergePolicy exactly once. You can change references to
 585   'writer' from <code>writer.doXYZ()</code> to <code>writer.get().doXYZ()</code>
 586   (it is also advisable to add an <code>assert writer != null;</code> before you
 587   access the wrapped IndexWriter.)
 588
 589   In addition, MergePolicy only exposes a default constructor, and the one that
 590   took IndexWriter as argument has been removed from all MergePolicy extensions.
 591   (Shai Erera via Mike McCandless)
 592
 593 * LUCENE-2328: SimpleFSDirectory.SimpleFSIndexInput is moved to
 594   FSDirectory.FSIndexInput. Anyone extending this class will have to
 595   fix their code on upgrading. (Earwin Burrfoot via Mike McCandless)
 596
 597 * LUCENE-2302: The new interface for term attributes, CharTermAttribute,
 598   now implements CharSequence. This requires the toString() methods of
 599   CharTermAttribute, deprecated TermAttribute, and Token to return only
 600   the term text and no other attribute contents. LUCENE-2374 implements
 601   an attribute reflection API to no longer rely on toString() for attribute
 602   inspection. (Uwe Schindler, Robert Muir)
 603
 604 * LUCENE-2372, LUCENE-2389: StandardAnalyzer, KeywordAnalyzer,
 605   PerFieldAnalyzerWrapper, WhitespaceTokenizer are now final.  Also removed
 606   the now obsolete and deprecated Analyzer.setOverridesTokenStreamMethod().
 607   Analyzer and TokenStream base classes now have an assertion in their ctor,
 608   that check subclasses to be final or at least have final implementations
 609   of incrementToken(), tokenStream(), and reusableTokenStream().
 610   (Uwe Schindler, Robert Muir)
 611
 612 * LUCENE-2316: Directory.fileLength contract was clarified - it returns the
 613   actual file's length if the file exists, and throws FileNotFoundException
 614   otherwise. Returning length=0 for a non-existent file is no longer allowed. If
 615   you relied on that, make sure to catch the exception. (Shai Erera)
 616
 617 * LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
 618   creation. Previously, if you passed an empty Directory and set OpenMode to
 619   CREATE*, IndexWriter would make a first empty commit. If you need that
 620   behavior you can call writer.commit()/close() immediately after you create it.
 621   (Shai Erera, Mike McCandless)
 622
 623 * LUCENE-2733: Removed public constructors of utility classes with only static
 624   methods to prevent instantiation.  (Uwe Schindler)
 625
 626 * LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
 627   takes deletions into account by default.  You can disable this by
 628   calling setCalibrateSizeByDeletes(false) on the merge policy.  (Mike
 629   McCandless)
 630
 631 * LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
 632   values in multi-valued field has been changed for some cases in index.
 633   If you index empty fields and uses positions/offsets information on that
 634   fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
 635
 636 * LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
 637   (Shai Erera, Robert Muir)
 638
 639 * LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
 640   Searchable are collapsed into IndexSearcher; contrib/remote and
 641   MultiSearcher have been removed.  (Mike McCandless)
 642
 643 * LUCENE-2854: Deprecated SimilarityDelegator and
 644   Similarity.lengthNorm; the latter is now final, forcing any custom
 645   Similarity impls to cutover to the more general computeNorm (Robert
 646   Muir, Mike McCandless)
 647
 648 * LUCENE-2869: Deprecated Query.getSimilarity: instead of using
 649   "runtime" subclassing/delegation, subclass the Weight instead.
 650   (Robert Muir)
 651
 652 * LUCENE-2674: A new idfExplain method was added to Similarity, that
 653   accepts an incoming docFreq.  If you subclass Similarity, make sure
 654   you also override this method on upgrade.  (Robert Muir, Mike
 655   McCandless)
 656
 657 Changes in runtime behavior
 658
 659 * LUCENE-1923: Made IndexReader.toString() produce something
 660   meaningful (Tim Smith via Mike McCandless)
 661
 662 * LUCENE-2179: CharArraySet.clear() is now functional.
 663   (Robert Muir, Uwe Schindler)
 664
 665 * LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index
 666   before it adds the new ones. Also, the existing segments are not merged and so
 667   the index will not end up with a single segment (unless it was empty before).
 668   In addition, addIndexesNoOptimize was renamed to addIndexes and no longer
 669   invokes a merge on the incoming and target segments, but instead copies the
 670   segments to the target index. You can call maybeMerge or optimize after this
 671   method completes, if you need to.
 672
 673   In addition, Directory.copyTo* were removed in favor of copy which takes the
 674   target Directory, source and target files as arguments, and copies the source
 675   file to the target Directory under the target file name. (Shai Erera)
 676
 677 * LUCENE-2663: IndexWriter no longer forcefully clears any existing
 678   locks when create=true.  This was a holdover from when
 679   SimpleFSLockFactory was the default locking implementation, and,
 680   even then it was dangerous since it could mask bugs in IndexWriter's
 681   usage, allowing applications to accidentally open two writers on the
 682   same directory.  (Mike McCandless)
 683
 684 * LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on
 685   LogMergePolicy now affect optimize() as well (as opposed to only regular
 686   merges). This means that you can run optimize() and too large segments won't
 687   be merged. (Shai Erera)
 688
 689 * LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
 690   guaranteeing the commits are sorted from oldest to latest. (Shai Erera)
 691
 692 * LUCENE-2785: TopScoreDocCollector, TopFieldCollector and
 693   the IndexSearcher search methods that take an int nDocs will now
 694   throw IllegalArgumentException if nDocs is 0.  Instead, you should
 695   use the newly added TotalHitCountCollector.  (Mike McCandless)
 696
 697 * LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
 698   to determine whether the passed in segment should be compound.
 699   (Shai Erera, Earwin Burrfoot)
 700
 701 * LUCENE-2805: IndexWriter now increments the index version on every change to
 702   the index instead of for every commit. Committing or closing the IndexWriter
 703   without any changes to the index will not cause any index version increment.
 704   (Simon Willnauer, Mike McCandless)
 705
 706 * LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
 707   Windows and Solaris systems that support unmapping, FSDirectory.open returns
 708   MMapDirectory. Additionally the behavior of MMapDirectory has been
 709   changed to enable unmapping by default if supported by the JRE.
 710   (Mike McCandless, Uwe Schindler, Robert Muir)
 711
 712 * LUCENE-2829: Improve the performance of "primary key" lookup use
 713   case (running a TermQuery that matches one document) on a
 714   multi-segment index.  (Robert Muir, Mike McCandless)
 715
 716 * LUCENE-2010: Segments with 100% deleted documents are now removed on
 717   IndexReader or IndexWriter commit.   (Uwe Schindler, Mike McCandless)
 718
 719 * LUCENE-2960: Allow some changes to IndexWriterConfig to take effect
 720   "live" (after an IW is instantiated), via
 721   IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless)
 722
 723 API Changes
 724
 725 * LUCENE-2076: Rename FSDirectory.getFile -> getDirectory.  (George
 726   Aroush via Mike McCandless)
 727
 728 * LUCENE-1260: Change norm encode (float->byte) and decode
 729   (byte->float) to be instance methods not static methods.  This way a
 730   custom Similarity can alter how norms are encoded, though they must
 731   still be encoded as a single byte (Johan Kindgren via Mike
 732   McCandless)
 733
 734 * LUCENE-2103: NoLockFactory should have a private constructor;
 735   until Lucene 4.0 the default one will be deprecated.
 736   (Shai Erera via Uwe Schindler)
 737
 738 * LUCENE-2177: Deprecate the Field ctors that take byte[] and Store.
 739   Since the removal of compressed fields, Store can only be YES, so
 740   it's not necessary to specify.  (Erik Hatcher via Mike McCandless)
 741
 742 * LUCENE-2200: Several final classes had non-overriding protected
 743   members. These were converted to private and unused protected
 744   constructors removed.  (Steven Rowe via Robert Muir)
 745
 746 * LUCENE-2240: SimpleAnalyzer and WhitespaceAnalyzer now have
 747   Version ctors.  (Simon Willnauer via Uwe Schindler)
 748
 749 * LUCENE-2259: Add IndexWriter.deleteUnusedFiles, to attempt removing
 750   unused files.  This is only useful on Windows, which prevents
 751   deletion of open files. IndexWriter will eventually remove these
 752   files itself; this method just lets you do so when you know the
 753   files are no longer open by IndexReaders. (luocanrao via Mike
 754   McCandless)
 755
 756 * LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier
 757   use by external code. In addition it offers a matchExtension method which
 758   callers can use to query whether a certain file matches a certain extension.
 759   (Shai Erera via Mike McCandless)
 760
 761 * LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery.
 762   This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but
 763   only scores terms by their boost values. For example, this can be used
 764   with FuzzyQuery to ensure that exact matches are always scored higher,
 765   because only the boost will be used in scoring.  (Robert Muir)
 766
 767 * LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to
 768   expose its folding logic.  (Cédrik Lime via Robert Muir)
 769
 770 * LUCENE-2294: IndexWriter constructors have been deprecated in favor of a
 771   single ctor which accepts IndexWriterConfig and a Directory. You can set all
 772   the parameters related to IndexWriter on IndexWriterConfig. The different
 773   setter/getter methods were deprecated as well. One should call
 774   writer.getConfig().getXYZ() to query for a parameter XYZ.
 775   Additionally, the setter/getter related to MergePolicy were deprecated as
 776   well. One should interact with the MergePolicy directly.
 777   (Shai Erera via Mike McCandless)
 778
 779 * LUCENE-2320: IndexWriter's MergePolicy configuration was moved to
 780   IndexWriterConfig and the respective methods on IndexWriter were deprecated.
 781   (Shai Erera via Mike McCandless)
 782
 783 * LUCENE-2328: Directory now keeps track itself of the files that are written
 784   but not yet fsynced. The old Directory.sync(String file) method is deprecated
 785   and replaced with Directory.sync(Collection<String> files). Take a look at
 786   FSDirectory to see a sample of how such tracking might look like, if needed
 787   in your custom Directories.  (Earwin Burrfoot via Mike McCandless)
 788
 789 * LUCENE-2302: Deprecated TermAttribute and replaced by a new
 790   CharTermAttribute. The change is backwards compatible, so
 791   mixed new/old TokenStreams all work on the same char[] buffer
 792   independent of which interface they use. CharTermAttribute
 793   has shorter method names and implements CharSequence and
 794   Appendable. This allows usage like Java's StringBuilder in
 795   addition to direct char[] access. Also terms can directly be
 796   used in places where CharSequence is allowed (e.g. regular
 797   expressions).
 798   (Uwe Schindler, Robert Muir)
 799
 800 * LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
 801   points too. If you use an IndexDeletionPolicy which holds onto index commits
 802   (such as SnapshotDeletionPolicy), you can call this method to remove those
 803   commit points when they are not needed anymore (instead of waiting for the
 804   next commit). (Shai Erera)
 805
 806 * LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
 807   with equivalent ones that take a String (id) as argument. You can pass
 808   whatever ID you want, as long as you use the same one when calling both.
 809   (Shai Erera)
 810
 811 * LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
 812   set what IndexWriter passes for termsIndexDivisor to the readers it
 813   opens internally when apply deletions or creating a near-real-time
 814   reader.  (Earwin Burrfoot via Mike McCandless)
 815
 816 * LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer
 817   in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
 818   Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
 819   points, including values from U+FFFF to U+10FFFF
 820
 821   ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
 822   Analyzer implementation and behavior.  Only the Unicode Basic Multilingual
 823   Plane (code points from U+0000 to U+FFFF) is covered.
 824
 825   UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
 826   relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
 827   (Steven Rowe, Robert Muir, Uwe Schindler)
 828
 829 * LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
 830   and return a different RAMFile implementation. (Shai Erera)
 831
 832 * LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
 833   count the number of hits matching the query.  (Mike McCandless)
 834
 835 * LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method
 836   is only syntactic sugar for setNorm(int, String, byte), but  using the global
 837   Similarity.getDefault().encodeNormValue().  Use the byte-based method instead
 838   to ensure that the norm is encoded with your Similarity.
 839   (Robert Muir, Mike McCandless)
 840
 841 * LUCENE-2374: Added Attribute reflection API: It's now possible to inspect the
 842   contents of AttributeImpl and AttributeSource using a well-defined API.
 843   This is e.g. used by Solr's AnalysisRequestHandlers to display all attributes
 844   in a structured way.
 845   There are also some backwards incompatible changes in toString() output,
 846   as LUCENE-2302 introduced the CharSequence interface to CharTermAttribute
 847   leading to changed toString() return values. The new API allows to get a
 848   string representation in a well-defined way using a new method
 849   reflectAsString(). For backwards compatibility reasons, when toString()
 850   was implemented by implementation subclasses, the default implementation of
 851   AttributeImpl.reflectWith() uses toString()s output instead to report the
 852   Attribute's properties. Otherwise, reflectWith() uses Java's reflection
 853   (like toString() did before) to get the attribute properties.
 854   In addition, the mandatory equals() and hashCode() are no longer required
 855   for AttributeImpls, but can still be provided (if needed).
 856   (Uwe Schindler)
 857
 858 * LUCENE-2691: Deprecate IndexWriter.getReader in favor of
 859   IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless)
 860
 861 * LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity,
 862   it should keep it itself. Fixed Scorers to pass their parent Weight, so that
 863   Scorer.visitSubScorers (LUCENE-2590) will work correctly.
 864   (Robert Muir, Doron Cohen)
 865
 866 * LUCENE-2900: When opening a near-real-time (NRT) reader
 867   (IndexReader.re/open(IndexWriter)) you can now specify whether
 868   deletes should be applied.  Applying deletes can be costly, and some
 869   expert use cases can handle seeing deleted documents returned.  The
 870   deletes remain buffered so that the next time you open an NRT reader
 871   and pass true, all deletes will be a applied.  (Mike McCandless)
 872
 873 * LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now
 874   require up front specification of enablePositionIncrement. Together with
 875   StopFilter they have a common base class (FilteringTokenFilter) that handles
 876   the position increments automatically. Implementors only need to override an
 877   accept() method that filters tokens.  (Uwe Schindler, Robert Muir)
 878
 879 Bug fixes
 880
 881 * LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
 882   close.  (Martin Traverso via Uwe Schindler)
 883
 884 * LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap
 885   incorrectly and lead to ConcurrentModificationException.
 886   (Uwe Schindler, Robert Muir)
 887
 888 * LUCENE-2328: Index files fsync tracking moved from
 889   IndexWriter/IndexReader to Directory, and it no longer leaks memory.
 890   (Earwin Burrfoot via Mike McCandless)
 891
 892 * LUCENE-2074: Reduce buffer size of lexer back to default on reset.
 893   (Ruben Laguna, Shai Erera via Uwe Schindler)
 894
 895 * LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on
 896   a prior (corrupt) index missing its segments_N file.  (Mike
 897   McCandless)
 898
 899 * LUCENE-2458: QueryParser no longer automatically forms phrase queries,
 900   assuming whitespace tokenization. Previously all CJK queries, for example,
 901   would be turned into phrase queries. The old behavior is preserved with
 902   the matchVersion parameter for previous versions. Additionally, you can
 903   explicitly enable the old behavior with setAutoGeneratePhraseQueries(true)
 904   (Robert Muir)
 905
 906 * LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in
 907   OOM if a large file was copied. (Shai Erera)
 908
 909 * LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions
 910   exceeds number of terms at one position (Jayendra Patil via Mike McCandless)
 911
 912 * LUCENE-2617: Optional clauses of a BooleanQuery were not factored
 913   into coord if the scorer for that segment returned null.  This
 914   can cause the same document to score to differently depending on
 915   what segment it resides in. (yonik)
 916
 917 * LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
 918
 919 * LUCENE-2732: Fix charset problems in XML loading in
 920   HyphenationCompoundWordTokenFilter.  (Uwe Schindler)
 921
 922 * LUCENE-2802: NRT DirectoryReader returned incorrect values from
 923   getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
 924   to a mutable reference to the IndexWriters SegmentInfos.
 925   (Simon Willnauer, Earwin Burrfoot)
 926
 927 * LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
 928   false EOF after seeking to EOF then seeking back to same block you
 929   were just in and then calling readBytes (Robert Muir, Mike McCandless)
 930
 931 * LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
 932   decides whether to return the cached computed size or not. (Shai Erera)
 933
 934 * LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if
 935   called by multiple threads. (Alexander Kanarsky via Shai Erera)
 936
 937 * LUCENE-2809: Fixed IndexWriter.numDocs to take into account
 938   applied but not yet flushed deletes.  (Mike McCandless)
 939
 940 * LUCENE-2879: MultiPhraseQuery previously calculated its phrase IDF by summing
 941   internally, it now calls Similarity.idfExplain(Collection, IndexSearcher).
 942   (Robert Muir)
 943
 944 * LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed.
 945   (Jason Rutherglen via Shai Erera)
 946
 947 * LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round()
 948   is safe also in strange locales.  (Uwe Schindler)
 949
 950 * LUCENE-2891: IndexWriterConfig did not accept -1 in setReaderTermIndexDivisor,
 951   which can be used to prevent loading the terms index into memory. (Shai Erera)
 952
 953 * LUCENE-2937: Encoding a float into a byte (e.g. encoding field norms during
 954   indexing) had an underflow detection bug that caused floatToByte(f)==0 where
 955   f was greater than 0, but slightly less than byteToFloat(1).  This meant that
 956   certain very small field norms (index_boost * length_norm) could have
 957   been rounded down to 0 instead of being rounded up to the smallest
 958   positive number.  (yonik)
 959
 960 * LUCENE-2936: PhraseQuery score explanations were not correctly
 961   identifying matches vs non-matches.  (hossman)
 962
 963 * LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if
 964   the underlying readByte() is inlined (which happens e.g. in MMapDirectory).
 965   The loop was unwinded which makes the hotspot bug disappear.
 966   (Uwe Schindler, Robert Muir, Mike McCandless)
 967
 968 New features
 969
 970 * LUCENE-2128: Parallelized fetching document frequencies during weight
 971   creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler)
 972
 973 * LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch
 974   to Java 5, supplementary characters are now lowercased correctly if the
 975   set is created as case insensitive.
 976   CharArraySet now requires a Version argument to preserve
 977   backwards compatibility. If Version < 3.1 is passed to the constructor,
 978   CharArraySet yields the old behavior. (Simon Willnauer)
 979
 980 * LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch
 981   to Java 5, supplementary characters are now lowercased correctly.
 982   LowerCaseFilter now requires a Version argument to preserve
 983   backwards compatibility. If Version < 3.1 is passed to the constructor,
 984   LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir)
 985
 986 * LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer
 987   that makes it easier to reuse TokenStreams correctly. This issue also added
 988   StopwordAnalyzerBase, which improves consistency of all Analyzers that use
 989   stopwords, and implement many analyzers in contrib with it.
 990   (Simon Willnauer via Robert Muir)
 991
 992 * LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a
 993   new KeywordAttribute.  (Simon Willnauer, Drew Farris via Uwe Schindler)
 994
 995 * LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support
 996   to CharTokenizer and its subclasses. CharTokenizer now has new
 997   int-API which is conditionally preferred to the old char-API depending
 998   on the provided Version. Version < 3.1 will use the char-API.
 999   (Simon Willnauer via Uwe Schindler)
1000
1001 * LUCENE-2247: Added a CharArrayMap<V> for performance improvements
1002   in some stemmers and synonym filters. (Uwe Schindler)
1003
1004 * LUCENE-2320: Added SetOnce which wraps an object and allows it to be set
1005   exactly once. (Shai Erera via Mike McCandless)
1006
1007 * LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that
1008   allows to use cloneAttributes() and this method as a replacement
1009   for captureState()/restoreState(), if the state itself
1010   needs to be inspected/modified.  (Uwe Schindler)
1011
1012 * LUCENE-2293: Expose control over max number of threads that
1013   IndexWriter will allow to run concurrently while indexing
1014   documents (previously this was hardwired to 5), using
1015   IndexWriterConfig.setMaxThreadStates.  (Mike McCandless)
1016
1017 * LUCENE-2297: Enable turning on reader pooling inside IndexWriter
1018   even when getReader (near-real-timer reader) is not in use, through
1019   IndexWriterConfig.enable/disableReaderPooling.  (Mike McCandless)
1020
1021 * LUCENE-2331: Add NoMergePolicy which never returns any merges to execute. In
1022   addition, add NoMergeScheduler which never executes any merges. These two are
1023   convenient classes in case you want to disable segment merges by IndexWriter
1024   without tweaking a particular MergePolicy parameters, such as mergeFactor.
1025   MergeScheduler's methods are now public. (Shai Erera via Mike McCandless)
1026
1027 * LUCENE-2339: Deprecate static method Directory.copy in favor of
1028   Directory.copyTo, and use nio's FileChannel.transferTo when copying
1029   files between FSDirectory instances.  (Earwin Burrfoot via Mike
1030   McCandless).
1031
1032 * LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
1033   matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)
1034
1035 * LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
1036   can be used to prevent commits from ever getting deleted from the index.
1037   (Shai Erera)
1038
1039 * LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
1040   return a DirPayloadProcessor for a given Directory, which returns a
1041   PayloadProcessor for a given Term. The PayloadProcessor will be used to
1042   process the payloads of the segments as they are merged (e.g. if one wants to
1043   rewrite payloads of external indexes as they are added, or of local ones).
1044   (Shai Erera, Michael Busch, Mike McCandless)
1045
1046 * LUCENE-2440: Add support for custom ExecutorService in
1047   ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
1048
1049 * LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter
1050   to wrap any other Analyzer and provide the same functionality as
1051   MaxFieldLength provided on IndexWriter.  This patch also fixes a bug
1052   in the offset calculation in CharTokenizer. (Uwe Schindler, Shai Erera)
1053
1054 * LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
1055   it's empty.  (Ross Woolf via Mike McCandless)
1056
1057 * LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
1058   McCandless)
1059
1060 * LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq.  Along
1061   with a custom Collector these experimental methods make it possible
1062   to gather the hit-count per sub-clause and per document while a
1063   search is running.  (Simon Willnauer, Mike McCandless)
1064
1065 * LUCENE-2636: Added MultiCollector which allows running the search with several
1066   Collectors. (Shai Erera)
1067
1068 * LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries
1069   to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
1070   Using this wrapper its easy to add fuzzy/wildcard to e.g. a SpanNearQuery.
1071   (Robert Muir, Uwe Schindler)
1072
1073 * LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query
1074   instance for stripping off scores. The use of a QueryWrapperFilter
1075   is no longer needed and discouraged for that use case. Directly wrapping
1076   Query improves performance, as out-of-order collection is now supported.
1077   (Uwe Schindler)
1078
1079 * LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to
1080   FieldInvertState so that it can be used in Similarity.computeNorm.
1081   (Robert Muir)
1082
1083 * LUCENE-2720: Segments now record the code version which created them.
1084   (Shai Erera, Mike McCandless, Uwe Schindler)
1085
1086 * LUCENE-2474: Added expert ReaderFinishedListener API to
1087   IndexReader, to allow apps that maintain external per-segment caches
1088   to evict entries when a segment is finished.  (Shay Banon, Yonik
1089   Seeley, Mike McCandless)
1090
1091 * LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and
1092   the ICUTokenizer in contrib now all tag types with a consistent set
1093   of token types (defined in StandardTokenizer). Tokens in the major
1094   CJK types are explicitly marked to allow for custom downstream handling:
1095   <IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>.
1096   (Robert Muir, Steven Rowe)
1097
1098 * LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler)
1099
1100 * LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields
1101   (Tim Smith, Grant Ingersoll)
1102
1103 * LUCENE-2692: Added several new SpanQuery classes for positional checking
1104   (match is in a range, payload is a specific value) (Grant Ingersoll)
1105
1106 Optimizations
1107
1108 * LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
1109   simple polling for results. (Edward Drapkin, Simon Willnauer)
1110
1111 * LUCENE-2075: Terms dict cache is now shared across threads instead
1112   of being stored separately in thread local storage.  Also fixed
1113   terms dict so that the cache is used when seeking the thread local
1114   term enum, which will be important for MultiTermQuery impls that do
1115   lots of seeking (Mike McCandless, Uwe Schindler, Robert Muir, Yonik
1116   Seeley)
1117
1118 * LUCENE-2136: If the multi reader (DirectoryReader or MultiReader)
1119   only has a single sub-reader, delegate all enum requests to it.
1120   This avoid the overhead of using a PQ unnecessarily.  (Mike
1121   McCandless)
1122
1123 * LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin
1124   Burrfoot via Mike McCandless)
1125
1126 * LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode
1127   into MultiTermQuery. The number of fuzzy expansions can be specified with
1128   the maxExpansions parameter to FuzzyQuery.
1129   (Uwe Schindler, Robert Muir, Mike McCandless)
1130
1131 * LUCENE-2164: ConcurrentMergeScheduler has more control over merge
1132   threads.  First, it gives smaller merges higher thread priority than
1133   larges ones.  Second, a new set/getMaxMergeCount setting will pause
1134   the larger merges to allow smaller ones to finish.  The defaults for
1135   these settings are now dynamic, depending the number CPU cores as
1136   reported by Runtime.getRuntime().availableProcessors() (Mike
1137   McCandless)
1138
1139 * LUCENE-2169: Improved CharArraySet.copy(), if source set is
1140   also a CharArraySet.  (Simon Willnauer via Uwe Schindler)
1141
1142 * LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
1143   directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to
1144   take advantage of this for faster performance.
1145   (Steven Rowe, Uwe Schindler, Robert Muir)
1146
1147 * LUCENE-2188: Add a utility class for tracking deprecated overridden
1148   methods in non-final subclasses.
1149   (Uwe Schindler, Robert Muir)
1150
1151 * LUCENE-2195: Speedup CharArraySet if set is empty.
1152   (Simon Willnauer via Robert Muir)
1153
1154 * LUCENE-2285: Code cleanup. (Shai Erera via Uwe Schindler)
1155
1156 * LUCENE-2303: Remove code duplication in Token class by subclassing
1157   TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve
1158   null-handling for TypeAttribute.  (Uwe Schindler)
1159
1160 * LUCENE-2329: Switch TermsHash* from using a PostingList object per unique
1161   term to parallel arrays, indexed by termID. This reduces garbage collection
1162   overhead significantly, which results in great indexing performance wins
1163   when the available JVM heap space is low. This will become even more
1164   important when the DocumentsWriter RAM buffer is searchable in the future,
1165   because then it will make sense to make the RAM buffers as large as
1166   possible. (Mike McCandless, Michael Busch)
1167
1168 * LUCENE-2380: The terms field cache methods (getTerms,
1169   getTermsIndex), which replace the older String equivalents
1170   (getStrings, getStringIndex), consume quite a bit less RAM in most
1171   cases.  (Mike McCandless)
1172
1173 * LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
1174   (Mike McCandless)
1175
1176 * LUCENE-2531: Fix issue when sorting by a String field that was
1177   causing too many fallbacks to compare-by-value (instead of by-ord).
1178   (Mike McCandless)
1179
1180 * LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
1181   efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
1182   streams. (Shai Erera)
1183
1184 * LUCENE-2719: Improved TermsHashPerField's sorting to use a better
1185   quick sort algorithm that dereferences the pivot element not on
1186   every compare call. Also replaced lots of sorting code in Lucene
1187   by the improved SorterTemplate class.
1188   (Uwe Schindler, Robert Muir, Mike McCandless)
1189
1190 * LUCENE-2760: Optimize SpanFirstQuery and SpanPositionRangeQuery.
1191   (Robert Muir)
1192
1193 * LUCENE-2770: Make SegmentMerger always work on atomic subreaders,
1194   even when IndexWriter.addIndexes(IndexReader...) is used with
1195   DirectoryReaders or other MultiReaders. This saves lots of memory
1196   during merge of norms.  (Uwe Schindler, Mike McCandless)
1197
1198 * LUCENE-2824: Optimize BufferedIndexInput to do less bounds checks.
1199   (Robert Muir)
1200
1201 * LUCENE-2010: Segments with 100% deleted documents are now removed on
1202   IndexReader or IndexWriter commit.  (Uwe Schindler, Mike McCandless)
1203
1204 * LUCENE-1472: Removed synchronization from static DateTools methods
1205   by using a ThreadLocal. Also converted DateTools.Resolution to a
1206   Java 5 enum (this should not break backwards).  (Uwe Schindler)
1207
1208 Build
1209
1210 * LUCENE-2124: Moved the JDK-based collation support from contrib/collation
1211   into core, and moved the ICU-based collation support into contrib/icu.
1212   (Robert Muir)
1213
1214 * LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
1215   branch is now included in the svn repository using "svn copy"
1216   after release. (Uwe Schindler)
1217
1218 * LUCENE-2074: Regenerating StandardTokenizerImpl files now needs
1219   JFlex 1.5 (currently only available on SVN). (Uwe Schindler)
1220
1221 * LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
1222   can force them to run sequentially by passing -Drunsequential=1 on the command
1223   line. The number of threads that are spawned per CPU defaults to '1'. If you
1224   wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
1225   (Robert Muir, Shai Erera, Peter Kofler)
1226
1227 * LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar
1228   from tarball of previous version. Backwards tests are now packaged together
1229   with src distribution.  (Uwe Schindler)
1230
1231 * LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration:
1232   "ant idea".  See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
1233   (Steven Rowe)
1234
1235 * LUCENE-2657: Switch from using Maven POM templates to full POMs when
1236   generating Maven artifacts (Steven Rowe)
1237
1238 * LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's
1239   tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera,
1240   Steven Rowe)
1241
1242 Test Cases
1243
1244 * LUCENE-2037 Allow Junit4 tests in our environment (Erick Erickson
1245   via Mike McCandless)
1246
1247 * LUCENE-1844: Speed up the unit tests (Mark Miller, Erick Erickson,
1248   Mike McCandless)
1249
1250 * LUCENE-2065: Use Java 5 generics throughout our unit tests.  (Kay
1251   Kay via Mike McCandless)
1252
1253 * LUCENE-2155: Fix time and zone dependent localization test failures
1254   in queryparser tests. (Uwe Schindler, Chris Male, Robert Muir)
1255
1256 * LUCENE-2170: Fix thread starvation problems.  (Uwe Schindler)
1257
1258 * LUCENE-2248, LUCENE-2251, LUCENE-2285: Refactor tests to not use
1259   Version.LUCENE_CURRENT, but instead use a global static value
1260   from LuceneTestCase(J4), that contains the release version.
1261   (Uwe Schindler, Simon Willnauer, Shai Erera)
1262
1263 * LUCENE-2313, LUCENE-2322: Add VERBOSE to LuceneTestCase(J4) to control
1264   verbosity of tests. If VERBOSE==false (default) tests should not print
1265   anything other than errors to System.(out|err). The setting can be
1266   changed with -Dtests.verbose=true on test invocation.
1267   (Shai Erera, Paul Elschot, Uwe Schindler)
1268
1269 * LUCENE-2318: Remove inconsistent system property code for retrieving
1270   temp and data directories inside test cases. It is now centralized in
1271   LuceneTestCase(J4). Also changed lots of tests to use
1272   getClass().getResourceAsStream() to retrieve test data. Tests needing
1273   access to "real" files from the test folder itself, can use
1274   LuceneTestCase(J4).getDataFile().  (Uwe Schindler)
1275
1276 * LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such
1277   as Eclipse and IntelliJ.
1278   (Paolo Castagna, Steven Rowe via Robert Muir)
1279
1280 * LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
1281   random. (Shai Erera, Robert Muir)
1282
1283 Documentation
1284
1285 * LUCENE-2579: Fix oal.search's package.html description of abstract
1286   methods.  (Santiago M. Mola via Mike McCandless)
1287
1288 * LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
1289   that the TermEnum must be seeked since it is unpositioned.
1290   (Adriano Crestani via Robert Muir)
1291
1292 * LUCENE-2894: Use google-code-prettify for syntax highlighting in javadoc.
1293   (Shinichiro Abe, Koji Sekiguchi)
1294
1295 ================== Release 2.9.4 / 3.0.3 ====================
1296
1297 Changes in runtime behavior
1298
1299 * LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a
1300   test lock just before the real lock is acquired.  (Surinder Pal
1301   Singh Bindra via Mike McCandless)
1302
1303 * LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
1304   handles against deleted files when compound-file was enabled (the
1305   default) and readers are pooled.  As a result of this the peak
1306   worst-case free disk space required during optimize is now 3X the
1307   index size, when compound file is enabled (else 2X).  (Mike
1308   McCandless)
1309
1310 * LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
1311   0.1), which means any time a merged segment is greater than 10% of
1312   the index size, it will be left in non-compound format even if
1313   compound format is on.  This change was made to reduce peak
1314   transient disk usage during optimize which increased due to
1315   LUCENE-2762.  (Mike McCandless)
1316
1317 Bug fixes
1318
1319 * LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer
1320   throws an exception when term count exceeds doc count.
1321   (Mike McCandless, Uwe Schindler)
1322
1323 * LUCENE-2513: when opening writable IndexReader on a not-current
1324   commit, do not overwrite "future" commits.  (Mike McCandless)
1325
1326 * LUCENE-2536: IndexWriter.rollback was failing to properly rollback
1327   buffered deletions against segments that were flushed (Mark Harwood
1328   via Mike McCandless)
1329
1330 * LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results
1331   with endpoints near Long.MIN_VALUE and Long.MAX_VALUE:
1332   NumericUtils.splitRange() overflowed, if
1333   - the range contained a LOWER bound
1334     that was greater than (Long.MAX_VALUE - (1L << precisionStep))
1335   - the range contained an UPPER bound
1336     that was less than (Long.MIN_VALUE + (1L << precisionStep))
1337   With standard precision steps around 4, this had no effect on
1338   most queries, only those that met the above conditions.
1339   Queries with large precision steps failed more easy. Queries with
1340   precision step >=64 were not affected. Also 32 bit data types int
1341   and float were not affected.
1342   (Yonik Seeley, Uwe Schindler)
1343
1344 * LUCENE-2593: Fixed certain rare cases where a disk full could lead
1345   to a corrupted index (Robert Muir, Mike McCandless)
1346
1347 * LUCENE-2620: Fixed a bug in WildcardQuery where too many asterisks
1348   would result in unbearably slow performance.  (Nick Barkas via Robert Muir)
1349
1350 * LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an
1351   exact multiple of the chunk size.  (Robert Muir)
1352
1353 * LUCENE-2634: isCurrent on an NRT reader was failing to return false
1354   if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless)
1355
1356 * LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing
1357   an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir)
1358
1359 * LUCENE-2384: Reset zzBuffer in StandardTokenizerImpl when lexer is reset.
1360   (Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074)
1361
1362 * LUCENE-2658: Exceptions while processing term vectors enabled for multiple
1363   fields could lead to invalid ArrayIndexOutOfBoundsExceptions.
1364   (Robert Muir, Mike McCandless)
1365
1366 * LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap().
1367   (Javier Godoy via Uwe Schindler)
1368
1369 * LUCENE-2328: Fixed memory leak in how IndexWriter/Reader tracked
1370   already sync'd files. (Earwin Burrfoot via Mike McCandless)
1371
1372 * LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record
1373   the absolute docid.  (Uwe Schindler)
1374
1375 * LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when
1376   primary & secondary dirs share the same underlying directory.
1377   (Michael McCandless)
1378
1379 * LUCENE-2365: IndexWriter.newestSegment (used normally for testing)
1380   is fixed to return null if there are no segments.  (Karthick
1381   Sankarachary via Mike McCandless)
1382
1383 * LUCENE-2730: Fix two rare deadlock cases in IndexWriter (Mike McCandless)
1384
1385 * LUCENE-2744: CheckIndex was stating total number of fields,
1386   not the number that have norms enabled, on the "test: field
1387   norms..." output.  (Mark Kristensson via Mike McCandless)
1388
1389 * LUCENE-2759: Fixed two near-real-time cases where doc store files
1390   may be opened for read even though they are still open for write.
1391   (Mike McCandless)
1392
1393 * LUCENE-2618: Fix rare thread safety issue whereby
1394   IndexWriter.optimize could sometimes return even though the index
1395   wasn't fully optimized (Mike McCandless)
1396
1397 * LUCENE-2767: Fix thread safety issue in addIndexes(IndexReader[])
1398   that could potentially result in index corruption.  (Mike
1399   McCandless)
1400
1401 * LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
1402   handles against deleted files when compound-file was enabled (the
1403   default) and readers are pooled.  As a result of this the peak
1404   worst-case free disk space required during optimize is now 3X the
1405   index size, when compound file is enabled (else 2X).  (Mike
1406   McCandless)
1407
1408 * LUCENE-2216: OpenBitSet.hashCode returned different hash codes for
1409   sets that only differed by trailing zeros. (Dawid Weiss, yonik)
1410
1411 * LUCENE-2782: Fix rare potential thread hazard with
1412   IndexWriter.commit (Mike McCandless)
1413
1414 API Changes
1415
1416 * LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
1417   0.1), which means any time a merged segment is greater than 10% of
1418   the index size, it will be left in non-compound format even if
1419   compound format is on.  This change was made to reduce peak
1420   transient disk usage during optimize which increased due to
1421   LUCENE-2762.  (Mike McCandless)
1422
1423 Optimizations
1424
1425 * LUCENE-2556: Improve memory usage after cloning TermAttribute.
1426   (Adriano Crestani via Uwe Schindler)
1427
1428 * LUCENE-2098: Improve the performance of BaseCharFilter, especially for
1429   large documents.  (Robin Wojciki, Koji Sekiguchi, Robert Muir)
1430
1431 New features
1432
1433 * LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files
1434   also in 2.9. The file format did not change, only the version number was
1435   upgraded to mark segments that have no compression. FieldsWriter still only
1436   writes 2.9 segments as they could contain compressed fields. This cross-version
1437   index format compatibility is provided here solely because Lucene 2.9 and 3.0
1438   have the same bugfix level, features, and the same index format with this slight
1439   compression difference. In general, Lucene does not support reading newer
1440   indexes with older library versions. (Uwe Schindler)
1441
1442 Documentation
1443
1444 * LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to
1445   Java NIO behavior when a Thread is interrupted while blocking on IO.
1446   (Simon Willnauer, Robert Muir)
1447
1448 ================== Release 2.9.3 / 3.0.2 ====================
1449
1450 Changes in backwards compatibility policy
1451
1452 * LUCENE-2135: Added FieldCache.purge(IndexReader) method to the
1453   interface.  Anyone implementing FieldCache externally will need to
1454   fix their code to implement this, on upgrading.  (Mike McCandless)
1455
1456 Changes in runtime behavior
1457
1458 * LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if
1459   it cannot delete the lock file, since obtaining the lock does not fail if the
1460   file is there. (Shai Erera)
1461
1462 * LUCENE-2060 (2.9.3 only): Changed ConcurrentMergeScheduler's default for
1463   maxNumThreads from 3 to 1, because in practice we get the most gains
1464   from running a single merge in the backround.  More than one
1465   concurrent merge causes alot of thrashing (though it's possible on
1466   SSD storage that there would be net gains).  (Jason Rutherglen, Mike
1467   McCandless)
1468
1469 Bug fixes
1470
1471 * LUCENE-2046 (2.9.3 only): IndexReader should not see the index as changed, after
1472   IndexWriter.prepareCommit has been called but before
1473   IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
1474
1475 * LUCENE-2119: Don't throw NegativeArraySizeException if you pass
1476   Integer.MAX_VALUE as nDocs to IndexSearcher search methods.  (Paul
1477   Taylor via Mike McCandless)
1478
1479 * LUCENE-2142: FieldCacheImpl.getStringIndex no longer throws an
1480   exception when term count exceeds doc count.  (Mike McCandless)
1481
1482 * LUCENE-2104: NativeFSLock.release() would silently fail if the lock is held by
1483   another thread/process.  (Shai Erera via Uwe Schindler)
1484
1485 * LUCENE-2283: Use shared memory pool for term vector and stored
1486   fields buffers. This memory will be reclaimed if needed according to
1487   the configured RAM Buffer Size for the IndexWriter.  This also fixes
1488   potentially excessive memory usage when many threads are indexing a
1489   mix of small and large documents.  (Tim Smith via Mike McCandless)
1490
1491 * LUCENE-2300: If IndexWriter is pooling reader (because NRT reader
1492   has been obtained), and addIndexes* is run, do not pool the
1493   readers from the external directory.  This is harmless (NRT reader is
1494   correct), but a waste of resources.  (Mike McCandless)
1495
1496 * LUCENE-2422: Don't reuse byte[] in IndexInput/Output -- it gains
1497   little performance, and ties up possibly large amounts of memory
1498   for apps that index large docs.  (Ross Woolf via Mike McCandless)
1499
1500 * LUCENE-2387: Don't hang onto Fieldables from the last doc indexed,
1501   in IndexWriter, nor the Reader in Tokenizer after close is
1502   called.  (Ruben Laguna, Uwe Schindler, Mike McCandless)
1503
1504 * LUCENE-2417: IndexCommit did not implement hashCode() and equals()
1505   consistently. Now they both take Directory and version into consideration. In
1506   addition, all of IndexComnmit methods which threw
1507   UnsupportedOperationException are now abstract. (Shai Erera)
1508
1509 * LUCENE-2467: Fixed memory leaks in IndexWriter when large documents
1510   are indexed.  (Mike McCandless)
1511
1512 * LUCENE-2473: Clicking on the "More Results" link in the luceneweb.war
1513   demo resulted in ArrayIndexOutOfBoundsException.
1514   (Sami Siren via Robert Muir)
1515
1516 * LUCENE-2476: If any exception is hit init'ing IW, release the write
1517   lock (previously we only released on IOException).  (Tamas Cservenak
1518   via Mike McCandless)
1519
1520 * LUCENE-2478: Fix CachingWrapperFilter to not throw NPE when
1521   Filter.getDocIdSet() returns null.  (Uwe Schindler, Daniel Noll)
1522
1523 * LUCENE-2468: Allow specifying how new deletions should be handled in
1524   CachingWrapperFilter and CachingSpanFilter.  By default, new
1525   deletions are ignored in CachingWrapperFilter, since typically this
1526   filter is AND'd with a query that correctly takes new deletions into
1527   account.  This should be a performance gain (higher cache hit rate)
1528   in apps that reopen readers, or use near-real-time reader
1529   (IndexWriter.getReader()), but may introduce invalid search results
1530   (allowing deleted docs to be returned) for certain cases, so a new
1531   expert ctor was added to CachingWrapperFilter to enforce deletions
1532   at a performance cost.  CachingSpanFilter by default recaches if
1533   there are new deletions (Shay Banon via Mike McCandless)
1534
1535 * LUCENE-2299: If you open an NRT reader while addIndexes* is running,
1536   it may miss some segments (Earwin Burrfoot via Mike McCandless)
1537
1538 * LUCENE-2397: Don't throw NPE from SnapshotDeletionPolicy.snapshot if
1539   there are no commits yet (Shai Erera)
1540
1541 * LUCENE-2424: Fix FieldDoc.toString to actually return its fields
1542   (Stephen Green via Mike McCandless)
1543
1544 * LUCENE-2311: Always pass a "fully loaded" (terms index & doc stores)
1545   SegmentsReader to IndexWriter's mergedSegmentWarmer (if set), so
1546   that warming is free to do whatever it needs to.  (Earwin Burrfoot
1547   via Mike McCandless)
1548
1549 * LUCENE-3029: Fix corner case when MultiPhraseQuery is used with zero
1550   position-increment tokens that would sometimes assign different
1551   scores to identical docs.  (Mike McCandless)
1552
1553 * LUCENE-2486: Fixed intermittent FileNotFoundException on doc store
1554   files when a mergedSegmentWarmer is set on IndexWriter.  (Mike
1555   McCandless)
1556
1557 * LUCENE-2130: Fix performance issue when FuzzyQuery runs on a
1558   multi-segment index (Michael McCandless)
1559
1560 API Changes
1561
1562 * LUCENE-2281: added doBeforeFlush to IndexWriter to allow extensions to perform
1563   operations before flush starts. Also exposed doAfterFlush as protected instead
1564   of package-private. (Shai Erera via Mike McCandless)
1565
1566 * LUCENE-2356: Add IndexWriter.set/getReaderTermsIndexDivisor, to set
1567   what IndexWriter passes for termsIndexDivisor to the readers it
1568   opens internally when applying deletions or creating a
1569   near-real-time reader.  (Earwin Burrfoot via Mike McCandless)
1570
1571 Optimizations
1572
1573 * LUCENE-2494 (3.0.2 only): Use CompletionService in ParallelMultiSearcher
1574   instead of simple polling for results. (Edward Drapkin, Simon Willnauer)
1575
1576 * LUCENE-2135: On IndexReader.close, forcefully evict any entries from
1577   the FieldCache rather than waiting for the WeakHashMap to release
1578   the reference (Mike McCandless)
1579
1580 * LUCENE-2161: Improve concurrency of IndexReader, especially in the
1581   context of near real-time readers.  (Mike McCandless)
1582
1583 * LUCENE-2360: Small speedup to recycling of reused per-doc RAM in
1584   IndexWriter (Robert Muir, Mike McCandless)
1585
1586 Build
1587
1588 * LUCENE-2488 (2.9.3 only): Support build with JDK 1.4 and exclude Java 1.5
1589   contrib modules on request (pass '-Dforce.jdk14.build=true') when
1590   compiling/testing/packaging. This marks the benchmark contrib also
1591   as Java 1.5, as it depends on fast-vector-highlighter. (Uwe Schindler)
1592
1593 ================== Release 2.9.2 / 3.0.1 ====================
1594
1595 Changes in backwards compatibility policy
1596
1597 * LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm
1598   from FuzzyQuery. The change was needed because the comparator of this
1599   class had to be changed in an incompatible way. The class was never
1600   intended to be public.  (Uwe Schindler, Mike McCandless)
1601
1602 Bug fixes
1603
1604  * LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode
1605    and equals methods, cause bad things to happen when caching
1606    BooleanQueries.  (Chris Hostetter, Mike McCandless)
1607
1608  * LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at
1609    the same time, it's possible for commit to return control back to
1610    one of the threads before all changes are actually committed.
1611    (Sanne Grinovero via Mike McCandless)
1612
1613  * LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser
1614    with a Version argument.  (Brian Li via Robert Muir)
1615
1616  * LUCENE-2166: Don't incorrectly keep warning about the same immense
1617    term, when IndexWriter.infoStream is on.  (Mike McCandless)
1618
1619  * LUCENE-2158: At high indexing rates, NRT reader could temporarily
1620    lose deletions.  (Mike McCandless)
1621
1622  * LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load
1623    implementation class when interface was loaded by a different
1624    class loader.  (Uwe Schindler, reported on java-user by Ahmed El-dawy)
1625
1626  * LUCENE-2257: Increase max number of unique terms in one segment to
1627    termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
1628    (Tom Burton-West via Mike McCandless)
1629
1630  * LUCENE-2260: Fixed AttributeSource to not hold a strong
1631    reference to the Attribute/AttributeImpl classes which prevents
1632    unloading of custom attributes loaded by other classloaders
1633    (e.g. in Solr plugins).  (Uwe Schindler)
1634
1635  * LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when
1636    only one payload is present.  (Erik Hatcher, Mike McCandless
1637    via Uwe Schindler)
1638
1639  * LUCENE-2270: Queries consisting of all zero-boost clauses
1640    (for example, text:foo^0) sorted incorrectly and produced
1641    invalid docids. (yonik)
1642
1643 API Changes
1644
1645  * LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor
1646    (it was accidentally removed in 3.0.0)  (Mike McCandless)
1647
1648  * LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource
1649    (it was accidentally removed in 3.0.0)  (John Wang via Uwe Schindler)
1650
1651  * LUCENE-2190: Added a new class CustomScoreProvider to function package
1652    that can be subclassed to provide custom scoring to CustomScoreQuery.
1653    The methods in CustomScoreQuery that did this before were deprecated
1654    and replaced by a method getCustomScoreProvider(IndexReader) that
1655    returns a custom score implementation using the above class. The change
1656    is necessary with per-segment searching, as CustomScoreQuery is
1657    a stateless class (like all other Queries) and does not know about
1658    the currently searched segment. This API works similar to Filter's
1659    getDocIdSet(IndexReader).  (Paul chez Jamespot via Mike McCandless,
1660    Uwe Schindler)
1661
1662  * LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant
1663    will cause backwards compatibility problems when upgrading Lucene. See
1664    the Version javadocs for additional information.
1665    (Robert Muir)
1666
1667 Optimizations
1668
1669  * LUCENE-2086: When resolving deleted terms, do so in term sort order
1670    for better performance (Bogdan Ghidireac via Mike McCandless)
1671
1672  * LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue
1673    added by LUCENE-504.  (Uwe Schindler, Robert Muir, Mike McCandless)
1674
1675  * LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum.
1676    (Uwe Schindler, Robert Muir)
1677
1678 Test Cases
1679
1680  * LUCENE-2114: Change TestFilteredSearch to test on multi-segment
1681    index as well. (Simon Willnauer via Mike McCandless)
1682
1683  * LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute
1684    that checks if clearAttributes() was called correctly.
1685    (Uwe Schindler, Robert Muir)
1686
1687  * LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if
1688    end() is implemented correctly.  (Koji Sekiguchi, Robert Muir)
1689
1690 Documentation
1691
1692  * LUCENE-2114: Improve javadocs of Filter to call out that the
1693    provided reader is per-segment (Simon Willnauer via Mike
1694    McCandless)
1695
1696 ======================= Release 3.0.0 =======================
1697
1698 Changes in backwards compatibility policy
1699
1700 * LUCENE-1979: Change return type of SnapshotDeletionPolicy#snapshot()
1701   from IndexCommitPoint to IndexCommit. Code that uses this method
1702   needs to be recompiled against Lucene 3.0 in order to work. The
1703   previously deprecated IndexCommitPoint is also removed.
1704   (Michael Busch)
1705
1706 * o.a.l.Lock.isLocked() is now allowed to throw an IOException.
1707   (Mike McCandless)
1708
1709 * LUCENE-2030: CachingWrapperFilter and CachingSpanFilter now hide
1710   the internal cache implementation for thread safety, before it was
1711   declared protected.  (Peter Lenahan, Uwe Schindler, Simon Willnauer)
1712
1713 * LUCENE-2053: If you call Thread.interrupt() on a thread inside
1714   Lucene, Lucene will do its best to interrupt the thread.  However,
1715   instead of throwing InterruptedException (which is a checked
1716   exception), you'll get an oal.util.ThreadInterruptedException (an
1717   unchecked exception, subclassing RuntimeException).  The interrupt
1718   status on the thread is cleared when this exception is thrown.
1719   (Mike McCandless)
1720
1721 * LUCENE-2052: Some methods in Lucene core were changed to accept
1722   Java 5 varargs. This is not a backwards compatibility problem as
1723   long as you not try to override such a method. We left common
1724   overridden methods unchanged and added varargs to constructors,
1725   static, or final methods (MultiSearcher,...).  (Uwe Schindler)
1726
1727 * LUCENE-1558: IndexReader.open(Directory) now opens a readOnly=true
1728   reader, and new IndexSearcher(Directory) does the same.  Note that
1729   this is a change in the default from 2.9, when these methods were
1730   previously deprecated.  (Mike McCandless)
1731
1732 * LUCENE-1753: Make not yet final TokenStreams final to enforce
1733   decorator pattern. (Uwe Schindler)
1734
1735 Changes in runtime behavior
1736
1737 * LUCENE-1677: Remove the system property to set SegmentReader class
1738   implementation.  (Uwe Schindler)
1739
1740 * LUCENE-1960: As a consequence of the removal of Field.Store.COMPRESS,
1741   support for this type of fields was removed. Lucene 3.0 is still able
1742   to read indexes with compressed fields, but as soon as merges occur
1743   or the index is optimized, all compressed fields are decompressed
1744   and converted to Field.Store.YES. Because of this, indexes with
1745   compressed fields can suddenly get larger. Also the first merge with
1746   decompression cannot be done in raw mode, it is therefore slower.
1747   This change has no effect for code that uses such old indexes,
1748   they behave as before (fields are automatically decompressed
1749   during read). Indexes converted to Lucene 3.0 format cannot be read
1750   anymore with previous versions.
1751   It is recommended to optimize your indexes after upgrading to convert
1752   to the new format and decompress all fields.
1753   If you want compressed fields, you can use CompressionTools, that
1754   creates compressed byte[] to be added as binary stored field. This
1755   cannot be done automatically, as you also have to decompress such
1756   fields when reading. You have to reindex to do that.
1757   (Michael Busch, Uwe Schindler)
1758
1759 * LUCENE-2060: Changed ConcurrentMergeScheduler's default for
1760   maxNumThreads from 3 to 1, because in practice we get the most
1761   gains from running a single merge in the background.  More than one
1762   concurrent merge causes a lot of thrashing (though it's possible on
1763   SSD storage that there would be net gains).  (Jason Rutherglen,
1764   Mike McCandless)
1765
1766 API Changes
1767
1768 * LUCENE-1257, LUCENE-1984, LUCENE-1985, LUCENE-2057, LUCENE-1833, LUCENE-2012,
1769   LUCENE-1998: Port to Java 1.5:
1770
1771   - Add generics to public and internal APIs (see below).
1772   - Replace new Integer(int), new Double(double),... by static valueOf() calls.
1773   - Replace for-loops with Iterator by foreach loops.
1774   - Replace StringBuffer with StringBuilder.
1775   - Replace o.a.l.util.Parameter by Java 5 enums (see below).
1776   - Add @Override annotations.
1777   (Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera,
1778   DM Smith)
1779
1780 * Generify Lucene API:
1781
1782   - TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an
1783     instance of the requested attribute interface and no cast needed anymore
1784     (LUCENE-1855).
1785   - NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter
1786     now have Integer, Long, Float, Double as type param (LUCENE-1857).
1787   - Document.getFields() returns List<Fieldable>.
1788   - Query.extractTerms(Set<Term>)
1789   - CharArraySet and stop word sets in core/contrib
1790   - PriorityQueue (LUCENE-1935)
1791   - TopDocCollector
1792   - DisjunctionMaxQuery (LUCENE-1984)
1793   - MultiTermQueryWrapperFilter
1794   - CloseableThreadLocal
1795   - MapOfSets
1796   - o.a.l.util.cache package
1797   - lot's of internal APIs of IndexWriter
1798  (Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani)
1799
1800 * LUCENE-1944, LUCENE-1856, LUCENE-1957, LUCENE-1960, LUCENE-1961,
1801   LUCENE-1968, LUCENE-1970, LUCENE-1946, LUCENE-1971, LUCENE-1975,
1802   LUCENE-1972, LUCENE-1978, LUCENE-944, LUCENE-1979, LUCENE-1973, LUCENE-2011:
1803   Remove deprecated methods/constructors/classes:
1804
1805   - Remove all String/File directory paths in IndexReader /
1806     IndexSearcher / IndexWriter.
1807   - Remove FSDirectory.getDirectory()
1808   - Make FSDirectory abstract.
1809   - Remove Field.Store.COMPRESS (see above).
1810   - Remove Filter.bits(IndexReader) method and make
1811     Filter.getDocIdSet(IndexReader) abstract.
1812   - Remove old DocIdSetIterator methods and make the new ones abstract.
1813   - Remove some methods in PriorityQueue.
1814   - Remove old TokenStream API and backwards compatibility layer.
1815   - Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery.
1816   - Remove SpanQuery.getTerms().
1817   - Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO.
1818   - Remove old-style custom sort.
1819   - Remove legacy search setting in SortField.
1820   - Remove Hits and all references from core and contrib.
1821   - Remove HitCollector and its TopDocs support implementations.
1822   - Remove term field and accessors in MultiTermQuery
1823     (and fix Highlighter).
1824   - Remove deprecated methods in BooleanQuery.
1825   - Remove deprecated methods in Similarity.
1826   - Remove BoostingTermQuery.
1827   - Remove MultiValueSource.
1828   - Remove Scorer.explain(int).
1829   ...and some other minor ones (Uwe Schindler, Michael Busch, Mark Miller)
1830
1831 * LUCENE-1925: Make IndexSearcher's subReaders and docStarts members
1832   protected; add expert ctor to directly specify reader, subReaders
1833   and docStarts.  (John Wang, Tim Smith via Mike McCandless)
1834
1835 * LUCENE-1945: All public classes that have a close() method now
1836   also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...).
1837   (Uwe Schindler)
1838
1839 * LUCENE-1998: Change all Parameter instances to Java 5 enums. This
1840   is no backwards-break, only a change of the super class. Parameter
1841   was deprecated and will be removed in a later version.
1842   (DM Smith, Uwe Schindler)
1843
1844 Bug fixes
1845
1846 * LUCENE-1951: When the text provided to WildcardQuery has no wildcard
1847   characters (ie matches a single term), don't lose the boost and
1848   rewrite method settings.  Also, rewrite to PrefixQuery if the
1849   wildcard is form "foo*", for slightly faster performance. (Robert
1850   Muir via Mike McCandless)
1851
1852 * LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
1853   (Benjamin Keil via Mark Miller)
1854
1855 * LUCENE-2088: addAttribute() should only accept interfaces that
1856   extend Attribute. (Shai Erera, Uwe Schindler)
1857
1858 * LUCENE-2045: Fix silly FileNotFoundException hit if you enable
1859   infoStream on IndexWriter and then add an empty document and commit
1860   (Shai Erera via Mike McCandless)
1861
1862 * LUCENE-2046: IndexReader should not see the index as changed, after
1863   IndexWriter.prepareCommit has been called but before
1864   IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
1865
1866 New features
1867
1868 * LUCENE-1933: Provide a convenience AttributeFactory that creates a
1869   Token instance for all basic attributes.  (Uwe Schindler)
1870
1871 * LUCENE-2041: Parallelize the rest of ParallelMultiSearcher. Lots of
1872   code refactoring and Java 5 concurrent support in MultiSearcher.
1873   (Joey Surls, Simon Willnauer via Uwe Schindler)
1874
1875 * LUCENE-2051: Add CharArraySet.copy() as a simple method to copy
1876   any Set<?> to a CharArraySet that is optimized, if Set<?> is already
1877   an CharArraySet.  (Simon Willnauer)
1878
1879 Optimizations
1880
1881 * LUCENE-1183: Optimize Levenshtein Distance computation in
1882   FuzzyQuery.  (Cédrik Lime via Mike McCandless)
1883
1884 * LUCENE-2006: Optimization of FieldDocSortedHitQueue to always
1885   use Comparable<?> interface.  (Uwe Schindler, Mark Miller)
1886
1887 * LUCENE-2087: Remove recursion in NumericRangeTermEnum.
1888   (Uwe Schindler)
1889
1890 Build
1891
1892 * LUCENE-486: Remove test->demo dependencies. (Michael Busch)
1893
1894 * LUCENE-2024: Raise build requirements to Java 1.5 and ANT 1.7.0
1895   (Uwe Schindler, Mike McCandless)
1896
1897 ======================= Release 2.9.1 =======================
1898
1899 Changes in backwards compatibility policy
1900
1901  * LUCENE-2002: Add required Version matchVersion argument when
1902    constructing QueryParser or MultiFieldQueryParser and, default (as
1903    of 2.9) enablePositionIncrements to true to match
1904    StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless)
1905
1906 Bug fixes
1907
1908  * LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used
1909    BooleanScorer for scoring), whereby some matching documents fail to
1910    be collected.  (Fulin Tang via Mike McCandless)
1911
1912  * LUCENE-1124: Make sure FuzzyQuery always matches the precise term.
1913    (stefatwork@gmail.com via Mike McCandless)
1914
1915  * LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing
1916    when the reader is a near real-time reader.  (Jake Mannix via Mike
1917    McCandless)
1918
1919  * LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan,
1920    Mark Miller via Mike McCandless)
1921
1922  * LUCENE-1992: Fix thread hazard if a merge is committing just as an
1923    exception occurs during sync (Uwe Schindler, Mike McCandless)
1924
1925  * LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB
1926    cannot exceed 2048 MB, and throw IllegalArgumentException if it
1927    does.  (Aaron McKee, Yonik Seeley, Mike McCandless)
1928
1929  * LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined
1930    by client code.  (Uwe Schindler)
1931
1932  * LUCENE-2016: Replace illegal U+FFFF character with the replacement
1933    char (U+FFFD) during indexing, to prevent silent index corruption.
1934    (Peter Keegan, Mike McCandless)
1935
1936 API Changes
1937
1938  * Un-deprecate search(Weight weight, Filter filter, int n) from
1939    Searchable interface (deprecated by accident).  (Uwe Schindler)
1940
1941  * Un-deprecate o.a.l.util.Version constants.  (Mike McCandless)
1942
1943  * LUCENE-1987: Un-deprecate some ctors of Token, as they will not
1944    be removed in 3.0 and are still useful. Also add some missing
1945    o.a.l.util.Version constants for enabling invalid acronym
1946    settings in StandardAnalyzer to be compatible with the coming
1947    Lucene 3.0.  (Uwe Schindler)
1948
1949  * LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring,
1950    to allow controlling per-IndexSearcher whether scores are computed
1951    when sorting by field.  (Uwe Schindler, Mike McCandless)
1952
1953  * LUCENE-2043: Make IndexReader.commit(Map<String,String>) public.
1954    (Mike McCandless)
1955
1956 Documentation
1957
1958  * LUCENE-1955: Fix Hits deprecation notice to point users in right
1959    direction. (Mike McCandless, Mark Miller)
1960
1961  * Fix javadoc about score tracking done by search methods in Searcher
1962    and IndexSearcher.  (Mike McCandless)
1963
1964  * LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
1965    (Luke Nezda via Mike McCandless)
1966
1967 ======================= Release 2.9.0 =======================
1968
1969 Changes in backwards compatibility policy
1970
1971  * LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
1972     longer computes a document score for each hit by default.  If
1973     document score tracking is still needed, you can call
1974     IndexSearcher.setDefaultFieldSortScoring(true, true) to enable
1975     both per-hit and maxScore tracking; however, this is deprecated
1976     and will be removed in 3.0.
1977
1978     Alternatively, use Searchable.search(Weight, Filter, Collector)
1979     and pass in a TopFieldCollector instance, using the following code
1980     sample:
1981
1982     <code>
1983       TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields,
1984                                                        true /* trackDocScores */,
1985                                                        true /* trackMaxScore */,
1986                                                        false /* docsInOrder */);
1987       searcher.search(query, tfc);
1988       TopDocs results = tfc.topDocs();
1989     </code>
1990
1991     Note that your Sort object cannot use SortField.AUTO when you
1992     directly instantiate TopFieldCollector.
1993
1994     Also, the method search(Weight, Filter, Collector) was added to
1995     the Searchable interface and the Searcher abstract class to
1996     replace the deprecated HitCollector versions.  If you either
1997     implement Searchable or extend Searcher, you should change your
1998     code to implement this method.  If you already extend
1999     IndexSearcher, no further changes are needed to use Collector.
2000
2001     Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not
2002     valid scores.  Lucene uses these values internally in certain
2003     places, so if you have hits with such scores, it will cause
2004     problems. (Shai Erera via Mike McCandless)
2005
2006  * LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache
2007     have been moved into FieldCache. ExtendedFieldCache is now deprecated and
2008     contains only a few declarations for binary backwards compatibility.
2009     ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and
2010     ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation.
2011     The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of
2012     ExtendedFieldCache and FieldCache, FieldCache can now additionally return
2013     long[] and double[] arrays in addition to int[] and float[] and StringIndex.
2014
2015     The interface changes are only notable for users implementing the interfaces,
2016     which was unlikely done, because there is no possibility to change
2017     Lucene's FieldCache implementation.  (Grant Ingersoll, Uwe Schindler)
2018
2019  * LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
2020     class. Some of the method signatures have changed, but it should be fairly
2021     easy to see what adjustments must be made to existing code to sync up
2022     with the new API. You can find more detail in the API Changes section.
2023
2024     Going forward Searchable will be kept for convenience only and may
2025     be changed between minor releases without any deprecation
2026     process. It is not recommended that you implement it, but rather extend
2027     Searcher.
2028     (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
2029
2030  * LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below)
2031     has some backwards breaks in rare cases. We did our best to make the
2032     transition as easy as possible and you are not likely to run into any problems.
2033     If your tokenizers still implement next(Token) or next(), the calls are
2034     automatically wrapped. The indexer and query parser use the new API
2035     (eg use incrementToken() calls). All core TokenStreams are implemented using
2036     the new API. You can mix old and new API style TokenFilters/TokenStream.
2037     Problems only occur when you have done the following:
2038     You have overridden next(Token) or next() in one of the non-abstract core
2039     TokenStreams/-Filters. These classes should normally be final, but some
2040     of them are not. In this case, next(Token)/next() would never be called.
2041     To fail early with a hard compile/runtime error, the next(Token)/next()
2042     methods in these TokenStreams/-Filters were made final in this release.
2043     (Michael Busch, Uwe Schindler)
2044
2045  * LUCENE-1763: MergePolicy now requires an IndexWriter instance to
2046     be passed upon instantiation. As a result, IndexWriter was removed
2047     as a method argument from all MergePolicy methods. (Shai Erera via
2048     Mike McCandless)
2049
2050  * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
2051     compat break and caused custom SpanQuery implementations to fail at runtime
2052     in a variety of ways. This issue attempts to remedy things by causing
2053     a compile time break on custom SpanQuery implementations and removing
2054     the PayloadSpans class, with its functionality now moved to Spans. To
2055     help in alleviating future back compat pain, Spans has been changed from
2056     an interface to an abstract class.
2057     (Hugh Cayless, Mark Miller)
2058
2059  * LUCENE-1808: Query.createWeight has been changed from protected to
2060     public. This will be a back compat break if you have overridden this
2061     method - but you are likely already affected by the LUCENE-1693 (make Weight
2062     abstract rather than an interface) back compat break if you have overridden
2063     Query.creatWeight, so we have taken the opportunity to make this change.
2064     (Tim Smith, Shai Erera via Mark Miller)
2065
2066  * LUCENE-1708 - IndexReader.document() no longer checks if the document is
2067     deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
2068     (Shai Erera via Mike McCandless)
2069
2070
2071 Changes in runtime behavior
2072
2073  * LUCENE-1424: QueryParser now by default uses constant score auto
2074     rewriting when it generates a WildcardQuery and PrefixQuery (it
2075     already does so for TermRangeQuery, as well).  Call
2076     setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
2077     to revert to slower BooleanQuery rewriting method.  (Mark Miller via Mike
2078     McCandless)
2079
2080  * LUCENE-1575: As of 2.9, the core collectors as well as
2081     IndexSearcher's search methods that return top N results, no
2082     longer filter documents with scores <= 0.0. If you rely on this
2083     functionality you can use PositiveScoresOnlyCollector like this:
2084
2085     <code>
2086       TopDocsCollector tdc = new TopScoreDocCollector(10);
2087       Collector c = new PositiveScoresOnlyCollector(tdc);
2088       searcher.search(query, c);
2089       TopDocs hits = tdc.topDocs();
2090       ...
2091     </code>
2092
2093  * LUCENE-1604: IndexReader.norms(String field) is now allowed to
2094     return null if the field has no norms, as long as you've
2095     previously called IndexReader.setDisableFakeNorms(true).  This
2096     setting now defaults to false (to preserve the fake norms back
2097     compatible behavior) but in 3.0 will be hardwired to true.  (Shon
2098     Vella via Mike McCandless).
2099
2100  * LUCENE-1624: If you open IndexWriter with create=true and
2101     autoCommit=false on an existing index, IndexWriter no longer
2102     writes an empty commit when it's created.  (Paul Taylor via Mike
2103     McCandless)
2104
2105  * LUCENE-1593: When you call Sort() or Sort.setSort(String field,
2106     boolean reverse), the resulting SortField array no longer ends
2107     with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties
2108     internally by docID). (Shai Erera via Michael McCandless)
2109
2110  * LUCENE-1542: When the first token(s) have 0 position increment,
2111     IndexWriter used to incorrectly record the position as -1, if no
2112     payload is present, or Integer.MAX_VALUE if a payload is present.
2113     This causes positional queries to fail to match.  The bug is now
2114     fixed, but if your app relies on the buggy behavior then you must
2115     call IndexWriter.setAllowMinus1Position().  That API is deprecated
2116     so you must fix your application, and rebuild your index, to not
2117     rely on this behavior by the 3.0 release of Lucene. (Jonathan
2118     Mamou, Mark Miller via Mike McCandless)
2119
2120
2121  * LUCENE-1715: Finalizers have been removed from the 4 core classes
2122     that still had them, since they will cause GC to take longer, thus
2123     tying up memory for longer, and at best they mask buggy app code.
2124     DirectoryReader (returned from IndexReader.open) & IndexWriter
2125     previously released the write lock during finalize.
2126     SimpleFSDirectory.FSIndexInput closed the descriptor in its
2127     finalizer, and NativeFSLock released the lock.  It's possible
2128     applications will be affected by this, but only if the application
2129     is failing to close reader/writers.  (Brian Groose via Mike
2130     McCandless)
2131
2132  * LUCENE-1717: Fixed IndexWriter to account for RAM usage of
2133     buffered deletions.  (Mike McCandless)
2134
2135  * LUCENE-1727: Ensure that fields are stored & retrieved in the
2136     exact order in which they were added to the document.  This was
2137     true in all Lucene releases before 2.3, but was broken in 2.3 and
2138     2.4, and is now fixed in 2.9.  (Mike McCandless)
2139
2140  * LUCENE-1678: The addition of Analyzer.reusableTokenStream
2141     accidentally broke back compatibility of external analyzers that
2142     subclassed core analyzers that implemented tokenStream but not
2143     reusableTokenStream.  This is now fixed, such that if
2144     reusableTokenStream is invoked on such a subclass, that method
2145     will forcefully fallback to tokenStream.  (Mike McCandless)
2146
2147  * LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear
2148     startOffset, endOffset and type. This is not likely to affect any
2149     Tokenizer chains, as Tokenizers normally always set these three values.
2150     This change was made to be conform to the new AttributeImpl.clear() and
2151     AttributeSource.clearAttributes() to work identical for Token as one for all
2152     AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch)
2153
2154  * LUCENE-1483: When searching over multiple segments, a new Scorer is now created
2155     for each segment. Searching has been telescoped out a level and IndexSearcher now
2156     operates much like MultiSearcher does. The Weight is created only once for the top
2157     level Searcher, but each Scorer is passed a per-segment IndexReader. This will
2158     result in doc ids in the Scorer being internal to the per-segment IndexReader. It
2159     has always been outside of the API to count on a given IndexReader to contain every
2160     doc id in the index - and if you have been ignoring MultiSearcher in your custom code
2161     and counting on this fact, you will find your code no longer works correctly. If a
2162     custom Scorer implementation uses any caches/filters that rely on being based on the
2163     top level IndexReader, it will need to be updated to correctly use contextless
2164     caches/filters eg you can't count on the IndexReader to contain any given doc id or
2165     all of the doc ids. (Mark Miller, Mike McCandless)
2166
2167  * LUCENE-1846: DateTools now uses the US locale to format the numbers in its
2168     date/time strings instead of the default locale. For most locales there will
2169     be no change in the index format, as DateFormatSymbols is using ASCII digits.
2170     The usage of the US locale is important to guarantee correct ordering of
2171     generated terms.  (Uwe Schindler)
2172
2173  * LUCENE-1860: MultiTermQuery now defaults to
2174     CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it
2175     was SCORING_BOOLEAN_QUERY_REWRITE).  This means that PrefixQuery
2176     and WildcardQuery will now produce constant score for all matching
2177     docs, equal to the boost of the query.  (Mike McCandless)
2178
2179 API Changes
2180
2181  * LUCENE-1419: Add expert API to set custom indexing chain. This API is
2182    package-protected for now, so we don't have to officially support it.
2183    Yet, it will give us the possibility to try out different consumers
2184    in the chain. (Michael Busch)
2185
2186  * LUCENE-1427: DocIdSet.iterator() is now allowed to throw
2187    IOException.  (Paul Elschot, Mike McCandless)
2188
2189  * LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called
2190    AttributeSource instead of the Token class, which is now a utility class that
2191    holds common Token attributes. All attributes that the Token class had have
2192    been moved into separate classes: TermAttribute, OffsetAttribute,
2193    PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute.
2194    The new API is much more flexible; it allows to combine the Attributes
2195    arbitrarily and also to define custom Attributes. The new API has the same
2196    performance as the old next(Token) approach. For conformance with this new
2197    API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
2198    (Michael Busch, Uwe Schindler; additional contributions and bug fixes by
2199    Daniel Shane, Doron Cohen)
2200
2201  * LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator.
2202    These methods can be used to avoid additional calls to doc().
2203    (Michael Busch)
2204
2205  * LUCENE-1468: Deprecate Directory.list(), which sometimes (in
2206    FSDirectory) filters out files that don't look like index files, in
2207    favor of new Directory.listAll(), which does no filtering.  Also,
2208    listAll() will never return null; instead, it throws an IOException
2209    (or subclass).  Specifically, FSDirectory.listAll() will throw the
2210    newly added NoSuchDirectoryException if the directory does not
2211    exist.  (Marcel Reutegger, Mike McCandless)
2212
2213  * LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
2214    you to record an opaque commitUserData (maps String -> String) into
2215    the commit written by IndexReader.  This matches IndexWriter's
2216    commit methods.  (Jason Rutherglen via Mike McCandless)
2217
2218  * LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
2219    enable compressing & decompressing binary content, external to
2220    Lucene's indexing.  Deprecated Field.Store.COMPRESS.
2221
2222  * LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions
2223     (Otis Gospodnetic via Mike McCandless)
2224
2225  * LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods
2226     to denote issues when offsets in TokenStream tokens exceed the length of the
2227     provided text.  (Mark Harwood)
2228
2229  * LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of
2230     a new Collector abstract class. For easy migration, people can use
2231     HitCollectorWrapper which translates (wraps) HitCollector into
2232     Collector. Note that this class is also deprecated and will be
2233     removed when HitCollector is removed.  Also TimeLimitedCollector
2234     is deprecated in favor of the new TimeLimitingCollector which
2235     extends Collector.  (Shai Erera, Mark Miller, Mike McCandless)
2236
2237  * LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because
2238     it is used nowhere in core/contrib and there is only a very ineffective
2239     default implementation available. If you want to position a TermEnum
2240     to another Term, create a new one using IndexReader.terms(Term).
2241     (Uwe Schindler)
2242
2243  * LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does
2244     not make sense for all subclasses of MultiTermQuery. Check individual
2245     subclasses to see if they support getTerm().  (Mark Miller)
2246
2247  * LUCENE-1636: Make TokenFilter.input final so it's set only
2248     once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
2249
2250  * LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
2251     (but left an FSDirectory base class).  Added an FSDirectory.open
2252     static method to pick a good default FSDirectory implementation
2253     given the OS. FSDirectories should now be instantiated using
2254     FSDirectory.open or with public constructors rather than
2255     FSDirectory.getDirectory(), which has been deprecated.
2256     (Michael McCandless, Uwe Schindler, yonik)
2257
2258  * LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0.
2259     Instead, when sorting by field, the application should explicitly
2260     state the type of the field.  (Mike McCandless)
2261
2262  * LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now
2263     require up front specification of enablePositionIncrement (Mike
2264     McCandless)
2265
2266  * LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor
2267     of the new nextDoc() and advance(). The new methods return the doc Id they
2268     landed on, saving an extra call to doc() in most cases.
2269     For easy migration of the code, you can change the calls to next() to
2270     nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo().
2271     However it is advised that you take advantage of the returned doc ID and not
2272     call doc() following those two.
2273     Also, doc() was deprecated in favor of docID(). docID() should return -1 or
2274     NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the
2275     iterator has exhausted. Otherwise it should return the current doc ID.
2276     (Shai Erera via Mike McCandless)
2277
2278  * LUCENE-1672: All ctors/opens and other methods using String/File to
2279     specify the directory in IndexReader, IndexWriter, and IndexSearcher
2280     were deprecated. You should instantiate the Directory manually before
2281     and pass it to these classes (LUCENE-1451, LUCENE-1658).
2282     (Uwe Schindler)
2283
2284  * LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out
2285     of Lucene's core into new contrib/remote package.  Searchable no
2286     longer extends java.rmi.Remote (Simon Willnauer via Mike
2287     McCandless)
2288
2289  * LUCENE-1677: The global property
2290     org.apache.lucene.SegmentReader.class, and
2291     ReadOnlySegmentReader.class are now deprecated, to be removed in
2292     3.0.  src/gcj/* has been removed. (Earwin Burrfoot via Mike
2293     McCandless)
2294
2295  * LUCENE-1673: Deprecated NumberTools in favour of the new
2296     NumericRangeQuery and its new indexing format for numeric or
2297     date values.  (Uwe Schindler)
2298
2299  * LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds
2300     a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /*
2301     topScorer */) method instead of scorer(IndexReader). IndexSearcher uses
2302     this method to obtain a scorer matching the capabilities of the Collector
2303     wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more
2304     efficient if out-of-order documents scoring is allowed by a Collector.
2305     Collector must now implement acceptsDocsOutOfOrder. If you write a
2306     Collector which does not care about doc ID orderness, it is recommended
2307     that you return true.  Weight has a scoresDocsOutOfOrder method, which by
2308     default returns false.  If you create a Weight which will score documents
2309     out of order if requested, you should override that method to return true.
2310     BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been
2311     deprecated as they are not needed anymore. BooleanQuery will now score docs
2312     out of order when used with a Collector that can accept docs out of order.
2313     Finally, Weight#explain now takes a sub-reader and sub-docID, rather than
2314     a top level reader and docID.
2315     (Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
2316
2317  * LUCENE-1466, LUCENE-1906: Added CharFilter and MappingCharFilter, which allows
2318     chaining & mapping of characters before tokenizers run. CharStream (subclass of
2319     Reader) is the base class for custom java.io.Reader's, that support offset
2320     correction. Tokenizers got an additional method correctOffset() that is passed
2321     down to the underlying CharStream if input is a subclass of CharStream/-Filter.
2322     (Koji Sekiguchi via Mike McCandless, Uwe Schindler)
2323
2324  * LUCENE-1703: Add IndexWriter.waitForMerges.  (Tim Smith via Mike
2325     McCandless)
2326
2327  * LUCENE-1625: CheckIndex's programmatic API now returns separate
2328     classes detailing the status of each component in the index, and
2329     includes more detailed status than previously.  (Tim Smith via
2330     Mike McCandless)
2331
2332  * LUCENE-1713: Deprecated RangeQuery and RangeFilter and renamed to
2333     TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant
2334     score auto rewrite mode by default. The new classes also have new
2335     ctors taking field and term ranges as Strings (see also
2336     LUCENE-1424).  (Uwe Schindler)
2337
2338  * LUCENE-1609: The termInfosIndexDivisor must now be specified
2339     up-front when opening the IndexReader.  Attempts to call
2340     IndexReader.setTermInfosIndexDivisor will hit an
2341     UnsupportedOperationException.  This was done to enable removal of
2342     all synchronization in TermInfosReader, which previously could
2343     cause threads to pile up in certain cases. (Dan Rosher via Mike
2344     McCandless)
2345
2346  * LUCENE-1688: Deprecate static final String stop word array in and
2347     StopAnalzyer and replace it with an immutable implementation of
2348     CharArraySet.  (Simon Willnauer via Mark Miller)
2349
2350  * LUCENE-1742: SegmentInfos, SegmentInfo and SegmentReader have been
2351     made public as expert, experimental APIs.  These APIs may suddenly
2352     change from release to release (Jason Rutherglen via Mike
2353     McCandless).
2354
2355  * LUCENE-1754: QueryWeight.scorer() can return null if no documents
2356     are going to be matched by the query. Similarly,
2357     Filter.getDocIdSet() can return null if no documents are going to
2358     be accepted by the Filter. Note that these 'can' return null,
2359     however they don't have to and can return a Scorer/DocIdSet which
2360     does not match / reject all documents.  This is already the
2361     behavior of some QueryWeight/Filter implementations, and is
2362     documented here just for emphasis. (Shai Erera via Mike
2363     McCandless)
2364
2365  * LUCENE-1705: Added IndexWriter.deleteAllDocuments.  (Tim Smith via
2366     Mike McCandless)
2367
2368  * LUCENE-1460: Changed TokenStreams/TokenFilters in contrib to
2369     use the new TokenStream API. (Robert Muir, Michael Busch)
2370
2371  * LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
2372     compat break and caused custom SpanQuery implementations to fail at runtime
2373     in a variety of ways. This issue attempts to remedy things by causing
2374     a compile time break on custom SpanQuery implementations and removing
2375     the PayloadSpans class, with its functionality now moved to Spans. To
2376     help in alleviating future back compat pain, Spans has been changed from
2377     an interface to an abstract class.
2378     (Hugh Cayless, Mark Miller)
2379
2380  * LUCENE-1808: Query.createWeight has been changed from protected to
2381     public. (Tim Smith, Shai Erera via Mark Miller)
2382
2383  * LUCENE-1826: Add constructors that take AttributeSource and
2384     AttributeFactory to all Tokenizer implementations.
2385     (Michael Busch)
2386
2387  * LUCENE-1847: Similarity#idf for both a Term and Term Collection have
2388     been deprecated. New versions that return an IDFExplanation have been
2389     added.  (Yasoja Seneviratne, Mike McCandless, Mark Miller)
2390
2391  * LUCENE-1877: Made NativeFSLockFactory the default for
2392     the new FSDirectory API (open(), FSDirectory subclass ctors).
2393     All FSDirectory system properties were deprecated and all lock
2394     implementations use no lock prefix if the locks are stored inside
2395     the index directory. Because the deprecated String/File ctors of
2396     IndexWriter and IndexReader (LUCENE-1672) and FSDirectory.getDirectory()
2397     still use the old SimpleFSLockFactory and the new API
2398     NativeFSLockFactory, we strongly recommend not to mix deprecated
2399     and new API. (Uwe Schindler, Mike McCandless)
2400
2401  * LUCENE-1911: Added a new method isCacheable() to DocIdSet. This method
2402     should return true, if the underlying implementation does not use disk
2403     I/O and is fast enough to be directly cached by CachingWrapperFilter.
2404     OpenBitSet, SortedVIntList, and DocIdBitSet are such candidates.
2405     The default implementation of the abstract DocIdSet class returns false.
2406     In this case, CachingWrapperFilter copies the DocIdSetIterator into an
2407     OpenBitSet for caching.  (Uwe Schindler, Thomas Becker)
2408
2409 Bug fixes
2410
2411  * LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
2412    implementation - Leads to Solr Cache misses.
2413    (Todd Feak, Mark Miller via yonik)
2414
2415  * LUCENE-1327: Fix TermSpans#skipTo() to behave as specified in javadocs
2416    of Terms#skipTo(). (Michael Busch)
2417
2418  * LUCENE-1573: Do not ignore InterruptedException (caused by
2419    Thread.interrupt()) nor enter deadlock/spin loop. Now, an interrupt
2420    will cause a RuntimeException to be thrown.  In 3.0 we will change
2421    public APIs to throw InterruptedException.  (Jeremy Volkman via
2422    Mike McCandless)
2423
2424  * LUCENE-1590: Fixed stored-only Field instances do not change the
2425    value of omitNorms, omitTermFreqAndPositions in FieldInfo; when you
2426    retrieve such fields they will now have omitNorms=true and
2427    omitTermFreqAndPositions=false (though these values are unused).
2428    (Uwe Schindler via Mike McCandless)
2429
2430  * LUCENE-1587: RangeQuery#equals() could consider a RangeQuery
2431    without a collator equal to one with a collator.
2432    (Mark Platvoet via Mark Miller)
2433
2434  * LUCENE-1600: Don't call String.intern unnecessarily in some cases
2435    when loading documents from the index.  (P Eger via Mike
2436    McCandless)
2437
2438  * LUCENE-1611: Fix case where OutOfMemoryException in IndexWriter
2439    could cause "infinite merging" to happen.  (Christiaan Fluit via
2440    Mike McCandless)
2441
2442  * LUCENE-1623: Properly handle back-compatibility of 2.3.x indexes that
2443    contain field names with non-ascii characters.  (Mike Streeton via
2444    Mike McCandless)
2445
2446  * LUCENE-1593: MultiSearcher and ParallelMultiSearcher did not break ties (in
2447    sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs.
2448    when it wasn't). (Shai Erera via Michael McCandless)
2449
2450  * LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
2451     the segment's deletion count to be incorrect. (Mike McCandless)
2452
2453  * LUCENE-1542: When the first token(s) have 0 position increment,
2454     IndexWriter used to incorrectly record the position as -1, if no
2455     payload is present, or Integer.MAX_VALUE if a payload is present.
2456     This causes positional queries to fail to match.  The bug is now
2457     fixed, but if your app relies on the buggy behavior then you must
2458     call IndexWriter.setAllowMinus1Position().  That API is deprecated
2459     so you must fix your application, and rebuild your index, to not
2460     rely on this behavior by the 3.0 release of Lucene. (Jonathan
2461     Mamou, Mark Miller via Mike McCandless)
2462
2463  * LUCENE-1658: Fixed MMapDirectory to correctly throw IOExceptions
2464     on EOF, removed numeric overflow possibilities and added support
2465     for a hack to unmap the buffers on closing IndexInput.
2466     (Uwe Schindler)
2467
2468  * LUCENE-1681: Fix infinite loop caused by a call to DocValues methods
2469     getMinValue, getMaxValue, getAverageValue. (Simon Willnauer via Mark Miller)
2470
2471  * LUCENE-1599: Add clone support for SpanQuerys. SpanRegexQuery counts
2472     on this functionality and does not work correctly without it.
2473     (Billow Gao, Mark Miller)
2474
2475  * LUCENE-1718: Fix termInfosIndexDivisor to carry over to reopened
2476     readers (Mike McCandless)
2477
2478  * LUCENE-1583: SpanOrQuery skipTo() doesn't always move forwards as Spans
2479         documentation indicates it should.  (Moti Nisenson via Mark Miller)
2480
2481  * LUCENE-1566: Sun JVM Bug
2482     http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 causes
2483     invalid OutOfMemoryError when reading too many bytes at once from
2484     a file on 32bit JVMs that have a large maximum heap size.  This
2485     fix adds set/getReadChunkSize to FSDirectory so that large reads
2486     are broken into chunks, to work around this JVM bug.  On 32bit
2487     JVMs the default chunk size is 100 MB; on 64bit JVMs, which don't
2488     show the bug, the default is Integer.MAX_VALUE. (Simon Willnauer
2489     via Mike McCandless)
2490
2491  * LUCENE-1448: Added TokenStream.end() to perform end-of-stream
2492     operations (ie to return the end offset of the tokenization).
2493     This is important when multiple fields with the same name are added
2494     to a document, to ensure offsets recorded in term vectors for all
2495     of the instances are correct.
2496     (Mike McCandless, Mark Miller, Michael Busch)
2497
2498  * LUCENE-1805: CloseableThreadLocal did not allow a null Object in get(),
2499     although it does allow it in set(Object). Fix get() to not assert the object
2500     is not null. (Shai Erera via Mike McCandless)
2501
2502  * LUCENE-1801: Changed all Tokenizers or TokenStreams in core/contrib)
2503     that are the source of Tokens to always call
2504     AttributeSource.clearAttributes() first. (Uwe Schindler)
2505
2506  * LUCENE-1819: MatchAllDocsQuery.toString(field) should produce output
2507     that is parsable by the QueryParser.  (John Wang, Mark Miller)
2508
2509  * LUCENE-1836: Fix localization bug in the new query parser and add
2510     new LocalizedTestCase as base class for localization junit tests.
2511     (Robert Muir, Uwe Schindler via Michael Busch)
2512
2513  * LUCENE-1847: PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats
2514     in their Weight#explain methods - these stats should be corpus wide.
2515     (Yasoja Seneviratne, Mike McCandless, Mark Miller)
2516
2517  * LUCENE-1885: Fix the bug that NativeFSLock.isLocked() did not work,
2518     if the lock was obtained by another NativeFSLock(Factory) instance.
2519     Because of this IndexReader.isLocked() and IndexWriter.isLocked() did
2520     not work correctly.  (Uwe Schindler)
2521
2522  * LUCENE-1899: Fix O(N^2) CPU cost when setting docIDs in order in an
2523     OpenBitSet, due to an inefficiency in how the underlying storage is
2524     reallocated.  (Nadav Har'El via Mike McCandless)
2525
2526  * LUCENE-1918: Fixed cases where a ParallelReader would
2527    generate exceptions on being passed to
2528    IndexWriter.addIndexes(IndexReader[]).  First case was when the
2529    ParallelReader was empty.  Second case was when the ParallelReader
2530    used to contain documents with TermVectors, but all such documents
2531    have been deleted. (Christian Kohlschütter via Mike McCandless)
2532
2533 New features
2534
2535  * LUCENE-1411: Added expert API to open an IndexWriter on a prior
2536     commit, obtained from IndexReader.listCommits.  This makes it
2537     possible to rollback changes to an index even after you've closed
2538     the IndexWriter that made the changes, assuming you are using an
2539     IndexDeletionPolicy that keeps past commits around.  This is useful
2540     when building transactional support on top of Lucene.  (Mike
2541     McCandless)
2542
2543  * LUCENE-1382: Add an optional arbitrary Map (String -> String)
2544     "commitUserData" to IndexWriter.commit(), which is stored in the
2545     segments file and is then retrievable via
2546     IndexReader.getCommitUserData instance and static methods.
2547     (Shalin Shekhar Mangar via Mike McCandless)
2548
2549  * LUCENE-1420: Similarity now has a computeNorm method that allows
2550     custom Similarity classes to override how norm is computed.  It's
2551     provided a FieldInvertState instance that contains details from
2552     inverting the field.  The default impl is boost *
2553     lengthNorm(numTerms), to be backwards compatible.  Also added
2554     {set/get}DiscountOverlaps to DefaultSimilarity, to control whether
2555     overlapping tokens (tokens with 0 position increment) should be
2556     counted in lengthNorm.  (Andrzej Bialecki via Mike McCandless)
2557
2558  * LUCENE-1424: Moved constant score query rewrite capability into
2559     MultiTermQuery, allowing TermRangeQuery, PrefixQuery and WildcardQuery
2560     to switch between constant-score rewriting or BooleanQuery
2561     expansion rewriting via a new setRewriteMethod method.
2562     Deprecated ConstantScoreRangeQuery (Mark Miller via Mike
2563     McCandless)
2564
2565  * LUCENE-1461: Added FieldCacheRangeFilter, a RangeFilter for
2566     single-term fields that uses FieldCache to compute the filter.  If
2567     your documents all have a single term for a given field, and you
2568     need to create many RangeFilters with varying lower/upper bounds,
2569     then this is likely a much faster way to create the filters than
2570     RangeFilter.  FieldCacheRangeFilter allows ranges on all data types,
2571     FieldCache supports (term ranges, byte, short, int, long, float, double).
2572     However, it comes at the expense of added RAM consumption and slower
2573     first-time usage due to populating the FieldCache.  It also does not
2574     support collation  (Tim Sturge, Matt Ericson via Mike McCandless and
2575     Uwe Schindler)
2576
2577  * LUCENE-1296: add protected method CachingWrapperFilter.docIdSetToCache
2578     to allow subclasses to choose which DocIdSet implementation to use
2579     (Paul Elschot via Mike McCandless)
2580
2581  * LUCENE-1390: Added ASCIIFoldingFilter, a Filter that converts
2582     alphabetic, numeric, and symbolic Unicode characters which are not in
2583     the first 127 ASCII characters (the "Basic Latin" Unicode block) into
2584     their ASCII equivalents, if one exists. ISOLatin1AccentFilter, which
2585     handles a subset of this filter, has been deprecated.
2586     (Andi Vajda, Steven Rowe via Mark Miller)
2587
2588  * LUCENE-1478: Added new SortField constructor allowing you to
2589     specify a custom FieldCache parser to generate numeric values from
2590     terms for a field.  (Uwe Schindler via Mike McCandless)
2591
2592  * LUCENE-1528: Add support for Ideographic Space to the queryparser.
2593     (Luis Alves via Michael Busch)
2594
2595  * LUCENE-1487: Added FieldCacheTermsFilter, to filter by multiple
2596     terms on single-valued fields.  The filter loads the FieldCache
2597     for the field the first time it's called, and subsequent usage of
2598     that field, even with different Terms in the filter, are fast.
2599     (Tim Sturge, Shalin Shekhar Mangar via Mike McCandless).
2600
2601  * LUCENE-1314: Add clone(), clone(boolean readOnly) and
2602     reopen(boolean readOnly) to IndexReader.  Cloning an IndexReader
2603     gives you a new reader which you can make changes to (deletions,
2604     norms) without affecting the original reader.  Now, with clone or
2605     reopen you can change the readOnly of the original reader.  (Jason
2606     Rutherglen, Mike McCandless)
2607
2608  * LUCENE-1506: Added FilteredDocIdSet, an abstract class which you
2609     subclass to implement the "match" method to accept or reject each
2610     docID.  Unlike ChainedFilter (under contrib/misc),
2611     FilteredDocIdSet never requires you to materialize the full
2612     bitset.  Instead, match() is called on demand per docID.  (John
2613     Wang via Mike McCandless)
2614
2615  * LUCENE-1398: Add ReverseStringFilter to contrib/analyzers, a filter
2616     to reverse the characters in each token.  (Koji Sekiguchi via yonik)
2617
2618  * LUCENE-1551: Add expert IndexReader.reopen(IndexCommit) to allow
2619     efficiently opening a new reader on a specific commit, sharing
2620     resources with the original reader.  (Torin Danil via Mike
2621     McCandless)
2622
2623  * LUCENE-1434: Added org.apache.lucene.util.IndexableBinaryStringTools,
2624     to encode byte[] as String values that are valid terms, and
2625     maintain sort order of the original byte[] when the bytes are
2626     interpreted as unsigned.  (Steven Rowe via Mike McCandless)
2627
2628  * LUCENE-1543: Allow MatchAllDocsQuery to optionally use norms from
2629     a specific fields to set the score for a document.  (Karl Wettin
2630     via Mike McCandless)
2631
2632  * LUCENE-1586: Add IndexReader.getUniqueTermCount().  (Mike
2633     McCandless via Derek)
2634
2635  * LUCENE-1516: Added "near real-time search" to IndexWriter, via a
2636     new expert getReader() method.  This method returns a reader that
2637     searches the full index, including any uncommitted changes in the
2638     current IndexWriter session.  This should result in a faster
2639     turnaround than the normal approach of commiting the changes and
2640     then reopening a reader.  (Jason Rutherglen via Mike McCandless)
2641
2642  * LUCENE-1603: Added new MultiTermQueryWrapperFilter, to wrap any
2643     MultiTermQuery as a Filter.  Also made some improvements to
2644     MultiTermQuery: return DocIdSet.EMPTY_DOCIDSET if there are no
2645     terms in the enum; track the total number of terms it visited
2646     during rewrite (getTotalNumberOfTerms).  FilteredTermEnum is also
2647     more friendly to subclassing.  (Uwe Schindler via Mike McCandless)
2648
2649  * LUCENE-1605: Added BitVector.subset().  (Jeremy Volkman via Mike
2650     McCandless)
2651
2652  * LUCENE-1618: Added FileSwitchDirectory that enables files with
2653     specified extensions to be stored in a primary directory and the
2654     rest of the files to be stored in the secondary directory.  For
2655     example, this can be useful for the large doc-store (stored
2656     fields, term vectors) files in FSDirectory and the rest of the
2657     index files in a RAMDirectory. (Jason Rutherglen via Mike
2658     McCandless)
2659
2660  * LUCENE-1494: Added FieldMaskingSpanQuery which can be used to
2661     cross-correlate Spans from different fields.
2662     (Paul Cowan and Chris Hostetter)
2663
2664  * LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
2665     deletions into account when considering merges.  (Yasuhiro Matsuda
2666     via Mike McCandless)
2667
2668  * LUCENE-1550: Added new n-gram based String distance measure for spell checking.
2669     See the Javadocs for NGramDistance.java for a reference paper on why
2670     this is helpful (Tom Morton via Grant Ingersoll)
2671
2672  * LUCENE-1470, LUCENE-1582, LUCENE-1602, LUCENE-1673, LUCENE-1701, LUCENE-1712:
2673     Added NumericRangeQuery and NumericRangeFilter, a fast alternative to
2674     RangeQuery/RangeFilter for numeric searches. They depend on a specific
2675     structure of terms in the index that can be created by indexing
2676     using the new NumericField or NumericTokenStream classes. NumericField
2677     can only be used for indexing and optionally stores the values as
2678     string representation in the doc store. Documents returned from
2679     IndexReader/IndexSearcher will return only the String value using
2680     the standard Fieldable interface. NumericFields can be sorted on
2681     and loaded into the FieldCache.  (Uwe Schindler, Yonik Seeley,
2682     Mike McCandless)
2683
2684  * LUCENE-1405: Added support for Ant resource collections in contrib/ant
2685     <index> task.  (Przemyslaw Sztoch via Erik Hatcher)
2686
2687  * LUCENE-1699: Allow setting a TokenStream on Field/Fieldable for indexing
2688     in conjunction with any other ways to specify stored field values,
2689     currently binary or string values.  (yonik)
2690
2691  * LUCENE-1701: Made the standard FieldCache.Parsers public and added
2692     parsers for fields generated using NumericField/NumericTokenStream.
2693     All standard parsers now also implement Serializable and enforce
2694     their singleton status.  (Uwe Schindler, Mike McCandless)
2695
2696  * LUCENE-1741: User configurable maximum chunk size in MMapDirectory.
2697     On 32 bit platforms, the address space can be very fragmented, so
2698     one big ByteBuffer for the whole file may not fit into address space.
2699     (Eks Dev via Uwe Schindler)
2700
2701  * LUCENE-1644: Enable 4 rewrite modes for queries deriving from
2702     MultiTermQuery (WildcardQuery, PrefixQuery, TermRangeQuery,
2703     NumericRangeQuery): CONSTANT_SCORE_FILTER_REWRITE first creates a
2704     filter and then assigns constant score (boost) to docs;
2705     CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE create a BooleanQuery but
2706     uses a constant score (boost); SCORING_BOOLEAN_QUERY_REWRITE also
2707     creates a BooleanQuery but keeps the BooleanQuery's scores;
2708     CONSTANT_SCORE_AUTO_REWRITE tries to pick the most performant
2709     constant-score rewrite method.  (Mike McCandless)
2710
2711  * LUCENE-1448: Added TokenStream.end(), to perform end-of-stream
2712     operations.  This is currently used to fix offset problems when
2713     multiple fields with the same name are added to a document.
2714     (Mike McCandless, Mark Miller, Michael Busch)
2715
2716  * LUCENE-1776: Add an option to not collect payloads for an ordered
2717     SpanNearQuery. Payloads were not lazily loaded in this case as
2718     the javadocs implied. If you have payloads and want to use an ordered
2719     SpanNearQuery that does not need to use the payloads, you can
2720     disable loading them with a new constructor switch.  (Mark Miller)
2721
2722  * LUCENE-1341: Added PayloadNearQuery to enable SpanNearQuery functionality
2723     with payloads (Peter Keegan, Grant Ingersoll, Mark Miller)
2724
2725  * LUCENE-1790: Added PayloadTermQuery to enable scoring of payloads
2726     based on the maximum payload seen for a document.
2727     Slight refactoring of Similarity and other payload queries (Grant Ingersoll, Mark Miller)
2728
2729  * LUCENE-1749: Addition of FieldCacheSanityChecker utility, and
2730     hooks to use it in all existing Lucene Tests.  This class can
2731     be used by any application to inspect the FieldCache and provide
2732     diagnostic information about the possibility of inconsistent
2733     FieldCache usage.  Namely: FieldCache entries for the same field
2734     with different datatypes or parsers; and FieldCache entries for
2735     the same field in both a reader, and one of it's (descendant) sub
2736     readers.
2737     (Chris Hostetter, Mark Miller)
2738
2739  * LUCENE-1789: Added utility class
2740     oal.search.function.MultiValueSource to ease the transition to
2741     segment based searching for any apps that directly call
2742     oal.search.function.* APIs.  This class wraps any other
2743     ValueSource, but takes care when composite (multi-segment) are
2744     passed to not double RAM usage in the FieldCache.  (Chris
2745     Hostetter, Mark Miller, Mike McCandless)
2746
2747 Optimizations
2748
2749  * LUCENE-1427: Fixed QueryWrapperFilter to not waste time computing
2750     scores of the query, since they are just discarded.  Also, made it
2751     more efficient (single pass) by not creating & populating an
2752     intermediate OpenBitSet (Paul Elschot, Mike McCandless)
2753
2754  * LUCENE-1443: Performance improvement for OpenBitSetDISI.inPlaceAnd()
2755     (Paul Elschot via yonik)
2756
2757  * LUCENE-1484: Remove synchronization of IndexReader.document() by
2758     using CloseableThreadLocal internally.  (Jason Rutherglen via Mike
2759     McCandless).
2760
2761  * LUCENE-1124: Short circuit FuzzyQuery.rewrite when input token length
2762     is small compared to minSimilarity. (Timo Nentwig, Mark Miller)
2763
2764  * LUCENE-1316: MatchAllDocsQuery now avoids the synchronized
2765     IndexReader.isDeleted() call per document, by directly accessing
2766     the underlying deleteDocs BitVector.  This improves performance
2767     with non-readOnly readers, especially in a multi-threaded
2768     environment.  (Todd Feak, Yonik Seeley, Jason Rutherglen via Mike
2769     McCandless)
2770
2771  * LUCENE-1483: When searching over multiple segments we now visit
2772     each sub-reader one at a time.  This speeds up warming, since
2773     FieldCache entries (if required) can be shared across reopens for
2774     those segments that did not change, and also speeds up searches
2775     that sort by relevance or by field values.  (Mark Miller, Mike
2776     McCandless)
2777
2778  * LUCENE-1575: The new Collector class decouples collect() from
2779     score computation.  Collector.setScorer is called to establish the
2780     current Scorer in-use per segment.  Collectors that require the
2781     score should then call Scorer.score() per hit inside
2782     collect(). (Shai Erera via Mike McCandless)
2783
2784  * LUCENE-1596: MultiTermDocs speedup when set with
2785     MultiTermDocs.seek(MultiTermEnum) (yonik)
2786
2787  * LUCENE-1653: Avoid creating a Calendar in every call to
2788     DateTools#dateToString, DateTools#timeToString and
2789     DateTools#round.  (Shai Erera via Mark Miller)
2790
2791  * LUCENE-1688: Deprecate static final String stop word array and
2792     replace it with an immutable implementation of CharArraySet.
2793     Removes conversions between Set and array.
2794     (Simon Willnauer via Mark Miller)
2795
2796  * LUCENE-1754: BooleanQuery.queryWeight.scorer() will return null if
2797     it won't match any documents (e.g. if there are no required and
2798     optional scorers, or not enough optional scorers to satisfy
2799     minShouldMatch).  (Shai Erera via Mike McCandless)
2800
2801  * LUCENE-1607: To speed up string interning for commonly used
2802     strings, the StringHelper.intern() interface was added with a
2803     default implementation that uses a lockless cache.
2804     (Earwin Burrfoot, yonik)
2805
2806  * LUCENE-1800: QueryParser should use reusable TokenStreams. (yonik)
2807
2808
2809 Documentation
2810
2811  * LUCENE-1908: Scoring documentation imrovements in Similarity javadocs.
2812    (Mark Miller, Shai Erera, Ted Dunning, Jiri Kuhn, Marvin Humphrey, Doron Cohen)
2813
2814  * LUCENE-1872: NumericField javadoc improvements
2815     (Michael McCandless, Uwe Schindler)
2816
2817  * LUCENE-1875: Make TokenStream.end javadoc less confusing.
2818     (Uwe Schindler)
2819
2820  * LUCENE-1862: Rectified duplicate package level javadocs for
2821     o.a.l.queryParser and o.a.l.analysis.cn.
2822     (Chris Hostetter)
2823
2824  * LUCENE-1886: Improved hyperlinking in key Analysis javadocs
2825     (Bernd Fondermann via Chris Hostetter)
2826
2827  * LUCENE-1884: massive javadoc and comment cleanup, primarily dealing with
2828     typos.
2829     (Robert Muir via Chris Hostetter)
2830
2831  * LUCENE-1898: Switch changes to use bullets rather than numbers and
2832     update changes-to-html script to handle the new format.
2833     (Steven Rowe, Mark Miller)
2834
2835  * LUCENE-1900: Improve Searchable Javadoc.
2836     (Nadav Har'El, Doron Cohen, Marvin Humphrey, Mark Miller)
2837
2838  * LUCENE-1896: Improve Similarity#queryNorm javadocs.
2839     (Jiri Kuhn, Mark Miller)
2840
2841 Build
2842
2843  * LUCENE-1440: Add new targets to build.xml that allow downloading
2844     and executing the junit testcases from an older release for
2845     backwards-compatibility testing. (Michael Busch)
2846
2847  * LUCENE-1446: Add compatibility tag to common-build.xml and run
2848     backwards-compatibility tests in the nightly build. (Michael Busch)
2849
2850  * LUCENE-1529: Properly test "drop-in" replacement of jar with
2851     backwards-compatibility tests. (Mike McCandless, Michael Busch)
2852
2853  * LUCENE-1851: Change 'javacc' and 'clean-javacc' targets to build
2854     and clean contrib/surround files. (Luis Alves via Michael Busch)
2855
2856  * LUCENE-1854: tar task should use longfile="gnu" to avoid false file
2857     name length warnings.  (Mark Miller)
2858
2859 Test Cases
2860
2861  * LUCENE-1791: Enhancements to the QueryUtils and CheckHits utility
2862     classes to wrap IndexReaders and Searchers in MultiReaders or
2863     MultiSearcher when possible to help exercise more edge cases.
2864     (Chris Hostetter, Mark Miller)
2865
2866  * LUCENE-1852: Fix localization test failures.
2867     (Robert Muir via Michael Busch)
2868
2869  * LUCENE-1843: Refactored all tests that use assertAnalyzesTo() & others
2870     in core and contrib to use a new BaseTokenStreamTestCase
2871     base class. Also rewrote some tests to use this general analysis assert
2872     functions instead of own ones (e.g. TestMappingCharFilter).
2873     The new base class also tests tokenization with the TokenStream.next()
2874     backwards layer enabled (using Token/TokenWrapper as attribute
2875     implementation) and disabled (default for Lucene 3.0)
2876     (Uwe Schindler, Robert Muir)
2877
2878  * LUCENE-1836: Added a new LocalizedTestCase as base class for localization
2879     junit tests.  (Robert Muir, Uwe Schindler via Michael Busch)
2880
2881 ======================= Release 2.4.1 =======================
2882
2883 API Changes
2884
2885 1. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
2886    resources.  (Christian Kohlschütter via Mike McCandless)
2887
2888 Bug fixes
2889
2890 1. LUCENE-1452: Fixed silent data-loss case whereby binary fields are
2891    truncated to 0 bytes during merging if the segments being merged
2892    are non-congruent (same field name maps to different field
2893    numbers).  This bug was introduced with LUCENE-1219.  (Andrzej
2894    Bialecki via Mike McCandless).
2895
2896 2. LUCENE-1429: Don't throw incorrect IllegalStateException from
2897    IndexWriter.close() if you've hit an OOM when autoCommit is true.
2898    (Mike McCandless)
2899
2900 3. LUCENE-1474: If IndexReader.flush() is called twice when there were
2901    pending deletions, it could lead to later false AssertionError
2902    during IndexReader.open.  (Mike McCandless)
2903
2904 4. LUCENE-1430: Fix false AlreadyClosedException from IndexReader.open
2905    (masking an actual IOException) that takes String or File path.
2906    (Mike McCandless)
2907
2908 5. LUCENE-1442: Multiple-valued NOT_ANALYZED fields can double-count
2909    token offsets.  (Mike McCandless)
2910
2911 6. LUCENE-1453: Ensure IndexReader.reopen()/clone() does not result in
2912    incorrectly closing the shared FSDirectory. This bug would only
2913    happen if you use IndexReader.open() with a File or String argument.
2914    The returned readers are wrapped by a FilterIndexReader that
2915    correctly handles closing of directory after reopen()/clone().
2916    (Mark Miller, Uwe Schindler, Mike McCandless)
2917
2918 7. LUCENE-1457: Fix possible overflow bugs during binary
2919    searches. (Mark Miller via Mike McCandless)
2920
2921 8. LUCENE-1459: Fix CachingWrapperFilter to not throw exception if
2922    both bits() and getDocIdSet() methods are called. (Matt Jones via
2923    Mike McCandless)
2924
2925 9. LUCENE-1519: Fix int overflow bug during segment merging.  (Deepak
2926    via Mike McCandless)
2927
2928 10. LUCENE-1521: Fix int overflow bug when flushing segment.
2929     (Shon Vella via Mike McCandless).
2930
2931 11. LUCENE-1544: Fix deadlock in IndexWriter.addIndexes(IndexReader[]).
2932     (Mike McCandless via Doug Sale)
2933
2934 12. LUCENE-1547: Fix rare thread safety issue if two threads call
2935     IndexWriter commit() at the same time.  (Mike McCandless)
2936
2937 13. LUCENE-1465: NearSpansOrdered returns payloads from first possible match
2938     rather than the correct, shortest match; Payloads could be returned even
2939     if the max slop was exceeded; The wrong payload could be returned in
2940     certain situations. (Jonathan Mamou, Greg Shackles, Mark Miller)
2941
2942 14. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
2943     resources.  (Christian Kohlschütter via Mike McCandless)
2944
2945 15. LUCENE-1552: Fix IndexWriter.addIndexes(IndexReader[]) to properly
2946     rollback IndexWriter's internal state on hitting an
2947     exception. (Scott Garland via Mike McCandless)
2948
2949 ======================= Release 2.4.0 =======================
2950
2951 Changes in backwards compatibility policy
2952
2953 1. LUCENE-1340: In a minor change to Lucene's backward compatibility
2954    policy, we are now allowing the Fieldable interface to have
2955    changes, within reason, and made on a case-by-case basis.  If an
2956    application implements it's own Fieldable, please be aware of
2957    this.  Otherwise, no need to be concerned.  This is in effect for
2958    all 2.X releases, starting with 2.4.  Also note, that in all
2959    likelihood, Fieldable will be changed in 3.0.
2960
2961
2962 Changes in runtime behavior
2963
2964  1. LUCENE-1151: Fix StandardAnalyzer to not mis-identify host names
2965     (eg lucene.apache.org) as an ACRONYM.  To get back to the pre-2.4
2966     backwards compatible, but buggy, behavior, you can either call
2967     StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static
2968     method), or, set system property
2969     org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
2970     to "false" on JVM startup.  All StandardAnalyzer instances created
2971     after that will then show the pre-2.4 behavior.  Alternatively,
2972     you can call setReplaceInvalidAcronym(false) to change the
2973     behavior per instance of StandardAnalyzer.  This backwards
2974     compatibility will be removed in 3.0 (hardwiring the value to
2975     true).  (Mike McCandless)
2976
2977  2. LUCENE-1044: IndexWriter with autoCommit=true now commits (such
2978     that a reader can see the changes) far less often than it used to.
2979     Previously, every flush was also a commit.  You can always force a
2980     commit by calling IndexWriter.commit().  Furthermore, in 3.0,
2981     autoCommit will be hardwired to false (IndexWriter constructors
2982     that take an autoCommit argument have been deprecated) (Mike
2983     McCandless)
2984
2985  3. LUCENE-1335: IndexWriter.addIndexes(Directory[]) and
2986     addIndexesNoOptimize no longer allow the same Directory instance
2987     to be passed in more than once.  Internally, IndexWriter uses
2988     Directory and segment name to uniquely identify segments, so
2989     adding the same Directory more than once was causing duplicates
2990     which led to problems (Mike McCandless)
2991
2992  4. LUCENE-1396: Improve PhraseQuery.toString() so that gaps in the
2993     positions are indicated with a ? and multiple terms at the same
2994     position are joined with a |.  (Andrzej Bialecki via Mike
2995     McCandless)
2996
2997 API Changes
2998
2999  1. LUCENE-1084: Changed all IndexWriter constructors to take an
3000     explicit parameter for maximum field size.  Deprecated all the
3001     pre-existing constructors; these will be removed in release 3.0.
3002     NOTE: these new constructors set autoCommit to false.  (Steven
3003     Rowe via Mike McCandless)
3004
3005  2. LUCENE-584: Changed Filter API to return a DocIdSet instead of a
3006     java.util.BitSet. This allows using more efficient data structures
3007     for Filters and makes them more flexible. This deprecates
3008     Filter.bits(), so all filters that implement this outside
3009     the Lucene code base will need to be adapted. See also the javadocs
3010     of the Filter class. (Paul Elschot, Michael Busch)
3011
3012  3. LUCENE-1044: Added IndexWriter.commit() which flushes any buffered
3013     adds/deletes and then commits a new segments file so readers will
3014     see the changes.  Deprecate IndexWriter.flush() in favor of
3015     IndexWriter.commit().  (Mike McCandless)
3016
3017  4. LUCENE-325: Added IndexWriter.expungeDeletes methods, which
3018     consult the MergePolicy to find merges necessary to merge away all
3019     deletes from the index.  This should be a somewhat lower cost
3020     operation than optimize.  (John Wang via Mike McCandless)
3021
3022  5. LUCENE-1233: Return empty array instead of null when no fields
3023     match the specified name in these methods in Document:
3024     getFieldables, getFields, getValues, getBinaryValues.  (Stefan
3025     Trcek vai Mike McCandless)
3026
3027  6. LUCENE-1234: Make BoostingSpanScorer protected.  (Andi Vajda via Grant Ingersoll)
3028
3029  7. LUCENE-510: The index now stores strings as true UTF-8 bytes
3030     (previously it was Java's modified UTF-8).  If any text, either
3031     stored fields or a token, has illegal UTF-16 surrogate characters,
3032     these characters are now silently replaced with the Unicode
3033     replacement character U+FFFD.  This is a change to the index file
3034     format.  (Marvin Humphrey via Mike McCandless)
3035
3036  8. LUCENE-852: Let the SpellChecker caller specify IndexWriter mergeFactor
3037     and RAM buffer size.  (Otis Gospodnetic)
3038
3039  9. LUCENE-1290: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator
3040     and remove all references to these classes from the core. Also update demos
3041     and tutorials. (Michael Busch)
3042
3043 10. LUCENE-1288: Add getVersion() and getGeneration() to IndexCommit.
3044     getVersion() returns the same value that IndexReader.getVersion()
3045     returns when the reader is opened on the same commit.  (Jason
3046     Rutherglen via Mike McCandless)
3047
3048 11. LUCENE-1311: Added IndexReader.listCommits(Directory) static
3049     method to list all commits in a Directory, plus IndexReader.open
3050     methods that accept an IndexCommit and open the index as of that
3051     commit.  These methods are only useful if you implement a custom
3052     DeletionPolicy that keeps more than the last commit around.
3053     (Jason Rutherglen via Mike McCandless)
3054
3055 12. LUCENE-1325: Added IndexCommit.isOptimized().  (Shalin Shekhar
3056     Mangar via Mike McCandless)
3057
3058 13. LUCENE-1324: Added TokenFilter.reset(). (Shai Erera via Mike
3059     McCandless)
3060
3061 14. LUCENE-1340: Added Fieldable.omitTf() method to skip indexing term
3062     frequency, positions and payloads.  This saves index space, and
3063     indexing/searching time.  (Eks Dev via Mike McCandless)
3064
3065 15. LUCENE-1219: Add basic reuse API to Fieldable for binary fields:
3066     getBinaryValue/Offset/Length(); currently only lazy fields reuse
3067     the provided byte[] result to getBinaryValue.  (Eks Dev via Mike
3068     McCandless)
3069
3070 16. LUCENE-1334: Add new constructor for Term: Term(String fieldName)
3071     which defaults term text to "".  (DM Smith via Mike McCandless)
3072
3073 17. LUCENE-1333: Added Token.reinit(*) APIs to re-initialize (reuse) a
3074     Token.  Also added term() method to return a String, with a
3075     performance penalty clearly documented.  Also implemented
3076     hashCode() and equals() in Token, and fixed all core and contrib
3077     analyzers to use the re-use APIs.  (DM Smith via Mike McCandless)
3078
3079 18. LUCENE-1329: Add optional readOnly boolean when opening an
3080     IndexReader.  A readOnly reader is not allowed to make changes
3081     (deletions, norms) to the index; in exchanged, the isDeleted
3082     method, often a bottleneck when searching with many threads, is
3083     not synchronized.  The default for readOnly is still false, but in
3084     3.0 the default will become true.  (Jason Rutherglen via Mike
3085     McCandless)
3086
3087 19. LUCENE-1367: Add IndexCommit.isDeleted().  (Shalin Shekhar Mangar
3088     via Mike McCandless)
3089
3090 20. LUCENE-1061: Factored out all "new XXXQuery(...)" in
3091     QueryParser.java into protected methods newXXXQuery(...) so that
3092     subclasses can create their own subclasses of each Query type.
3093     (John Wang via Mike McCandless)
3094
3095 21. LUCENE-753: Added new Directory implementation
3096     org.apache.lucene.store.NIOFSDirectory, which uses java.nio's
3097     FileChannel to do file reads.  On most non-Windows platforms, with
3098     many threads sharing a single searcher, this may yield sizable
3099     improvement to query throughput when compared to FSDirectory,
3100     which only allows a single thread to read from an open file at a
3101     time.  (Jason Rutherglen via Mike McCandless)
3102
3103 22. LUCENE-1371: Added convenience method TopDocs Searcher.search(Query query, int n).
3104     (Mike McCandless)
3105
3106 23. LUCENE-1356: Allow easy extensions of TopDocCollector by turning
3107     constructor and fields from package to protected. (Shai Erera
3108     via Doron Cohen)
3109
3110 24. LUCENE-1375: Added convenience method IndexCommit.getTimestamp,
3111     which is equivalent to
3112     getDirectory().fileModified(getSegmentsFileName()).  (Mike McCandless)
3113
3114 23. LUCENE-1366: Rename Field.Index options to be more accurate:
3115     TOKENIZED becomes ANALYZED;  UN_TOKENIZED becomes NOT_ANALYZED;
3116     NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS
3117     is added.  (Mike McCandless)
3118
3119 24. LUCENE-1131: Added numDeletedDocs method to IndexReader (Otis Gospodnetic)
3120
3121 Bug fixes
3122
3123  1. LUCENE-1134: Fixed BooleanQuery.rewrite to only optimize a single
3124     clause query if minNumShouldMatch<=0. (Shai Erera via Michael Busch)
3125
3126  2. LUCENE-1169: Fixed bug in IndexSearcher.search(): searching with
3127     a filter might miss some hits because scorer.skipTo() is called
3128     without checking if the scorer is already at the right position.
3129     scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as
3130     scorer.next(). (Eks Dev, Michael Busch)
3131
3132  3. LUCENE-1182: Added scorePayload to SimilarityDelegator (Andi Vajda via Grant Ingersoll)
3133
3134  4. LUCENE-1213: MultiFieldQueryParser was ignoring slop in case
3135     of a single field phrase. (Trejkaz via Doron Cohen)
3136
3137  5. LUCENE-1228: IndexWriter.commit() was not updating the index version and as
3138     result IndexReader.reopen() failed to sense index changes. (Doron Cohen)
3139
3140  6. LUCENE-1267: Added numDocs() and maxDoc() to IndexWriter;
3141     deprecated docCount().  (Mike McCandless)
3142
3143  7. LUCENE-1274: Added new prepareCommit() method to IndexWriter,
3144     which does phase 1 of a 2-phase commit (commit() does phase 2).
3145     This is needed when you want to update an index as part of a
3146     transaction involving external resources (eg a database).  Also
3147     deprecated abort(), renaming it to rollback().  (Mike McCandless)
3148
3149  8. LUCENE-1003: Stop RussianAnalyzer from removing numbers.
3150     (TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
3151
3152  9. LUCENE-1152: SpellChecker fix around clearIndex and indexDictionary
3153     methods, plus removal of IndexReader reference.
3154     (Naveen Belkale via Otis Gospodnetic)
3155
3156 10. LUCENE-1046: Removed dead code in SpellChecker
3157     (Daniel Naber via Otis Gospodnetic)
3158
3159 11. LUCENE-1189: Fixed the QueryParser to handle escaped characters within
3160     quoted terms correctly. (Tomer Gabel via Michael Busch)
3161
3162 12. LUCENE-1299: Fixed NPE in SpellChecker when IndexReader is not null and field is (Grant Ingersoll)
3163
3164 13. LUCENE-1303: Fixed BoostingTermQuery's explanation to be marked as a Match
3165     depending only upon the non-payload score part, regardless of the effect of
3166     the payload on the score. Prior to this, score of a query containing a BTQ
3167     differed from its explanation. (Doron Cohen)
3168
3169 14. LUCENE-1310: Fixed SloppyPhraseScorer to work also for terms repeating more
3170     than twice in the query. (Doron Cohen)
3171
3172 15. LUCENE-1351: ISOLatin1AccentFilter now cleans additional ligatures (Cedrik Lime via Grant Ingersoll)
3173
3174 16. LUCENE-1383: Workaround a nasty "leak" in Java's builtin
3175     ThreadLocal, to prevent Lucene from causing unexpected
3176     OutOfMemoryError in certain situations (notably J2EE
3177     applications).  (Chris Lu via Mike McCandless)
3178
3179 New features
3180
3181  1. LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis
3182     process.  The flag is not indexed/stored and is thus only used by analysis.
3183
3184  2. LUCENE-1147: Add -segment option to CheckIndex tool so you can
3185     check only a specific segment or segments in your index.  (Mike
3186     McCandless)
3187
3188  3. LUCENE-1045: Reopened this issue to add support for short and bytes.
3189
3190  4. LUCENE-584: Added new data structures to o.a.l.util, such as
3191     OpenBitSet and SortedVIntList. These extend DocIdSet and can
3192     directly be used for Filters with the new Filter API. Also changed
3193     the core Filters to use OpenBitSet instead of java.util.BitSet.
3194     (Paul Elschot, Michael Busch)
3195
3196  5. LUCENE-494: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms.
3197     This Analyzer is not intended for use during indexing. (Mark Harwood via Grant Ingersoll)
3198
3199  6. LUCENE-1044: Change Lucene to properly "sync" files after
3200     committing, to ensure on a machine or OS crash or power cut, even
3201     with cached writes, the index remains consistent.  Also added
3202     explicit commit() method to IndexWriter to force a commit without
3203     having to close.  (Mike McCandless)
3204
3205  7. LUCENE-997: Add search timeout (partial) support.
3206     A TimeLimitedCollector was added to allow limiting search time.
3207     It is a partial solution since timeout is checked only when
3208     collecting a hit, and therefore a search for rare words in a
3209     huge index might not stop within the specified time.
3210     (Sean Timm via Doron Cohen)
3211
3212  8. LUCENE-1184: Allow SnapshotDeletionPolicy to be re-used across
3213     close/re-open of IndexWriter while still protecting an open
3214     snapshot (Tim Brennan via Mike McCandless)
3215
3216  9. LUCENE-1194: Added IndexWriter.deleteDocuments(Query) to delete
3217     documents matching the specified query.  Also added static unlock
3218     and isLocked methods (deprecating the ones in IndexReader).  (Mike
3219     McCandless)
3220
3221 10. LUCENE-1201: Add IndexReader.getIndexCommit() method. (Tim Brennan
3222     via Mike McCandless)
3223
3224 11. LUCENE-550:  Added InstantiatedIndex implementation.  Experimental
3225     Index store similar to MemoryIndex but allows for multiple documents
3226     in memory.  (Karl Wettin via Grant Ingersoll)
3227
3228 12. LUCENE-400: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper
3229     that wraps another Analyzer's token stream with a ShingleFilter (Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
3230
3231 13. LUCENE-1166: Decomposition tokenfilter for languages like German and Swedish (Thomas Peuss via Grant Ingersoll)
3232
3233 14. LUCENE-1187: ChainedFilter and BooleanFilter now work with new Filter API
3234     and DocIdSetIterator-based filters. Backwards-compatibility with old
3235     BitSet-based filters is ensured. (Paul Elschot via Michael Busch)
3236
3237 15. LUCENE-1295: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public. (Grant Ingersoll)
3238
3239 16. LUCENE-1298: MoreLikeThis can now accept a custom Similarity (Grant Ingersoll)
3240
3241 17. LUCENE-1297: Allow other string distance measures for the SpellChecker
3242     (Thomas Morton via Otis Gospodnetic)
3243
3244 18. LUCENE-1001: Provide access to Payloads via Spans.  All existing Span Query implementations in Lucene implement. (Mark Miller, Grant Ingersoll)
3245
3246 19. LUCENE-1354: Provide programmatic access to CheckIndex (Grant Ingersoll, Mike McCandless)
3247
3248 20. LUCENE-1279: Add support for Collators to RangeFilter/Query and Query Parser.  (Steve Rowe via Grant Ingersoll)
3249
3250 Optimizations
3251
3252  1. LUCENE-705: When building a compound file, use
3253     RandomAccessFile.setLength() to tell the OS/filesystem to
3254     pre-allocate space for the file.  This may improve fragmentation
3255     in how the CFS file is stored, and allows us to detect an upcoming
3256     disk full situation before actually filling up the disk.  (Mike
3257     McCandless)
3258
3259  2. LUCENE-1120: Speed up merging of term vectors by bulk-copying the
3260     raw bytes for each contiguous range of non-deleted documents.
3261     (Mike McCandless)
3262
3263  3. LUCENE-1185: Avoid checking if the TermBuffer 'scratch' in
3264     SegmentTermEnum is null for every call of scanTo().
3265     (Christian Kohlschuetter via Michael Busch)
3266
3267  4. LUCENE-1217: Internal to Field.java, use isBinary instead of
3268     runtime type checking for possible speedup of binaryValue().
3269     (Eks Dev via Mike McCandless)
3270
3271  5. LUCENE-1183: Optimized TRStringDistance class (in contrib/spell) that uses
3272     less memory than the previous version.  (Cédrik LIME via Otis Gospodnetic)
3273
3274  6. LUCENE-1195: Improve term lookup performance by adding a LRU cache to the
3275     TermInfosReader. In performance experiments the speedup was about 25% on
3276     average on mid-size indexes with ~500,000 documents for queries with 3
3277     terms and about 7% on larger indexes with ~4.3M documents. (Michael Busch)
3278
3279 Documentation
3280
3281   1. LUCENE-1236:  Added some clarifying remarks to EdgeNGram*.java (Hiroaki Kawai via Grant Ingersoll)
3282
3283   2. LUCENE-1157 and LUCENE-1256: HTML changes log, created automatically
3284      from CHANGES.txt. This HTML file is currently visible only via developers page.
3285      (Steven Rowe via Doron Cohen)
3286
3287   3. LUCENE-1349: Fieldable can now be changed without breaking backward compatibility rules (within reason.  See the note at
3288   the top of this file and also on Fieldable.java).  (Grant Ingersoll)
3289
3290   4. LUCENE-1873: Update documentation to reflect current Contrib area status.
3291      (Steven Rowe, Mark Miller)
3292
3293 Build
3294
3295   1. LUCENE-1153: Added JUnit JAR to new lib directory.  Updated build to rely on local JUnit instead of ANT/lib.
3296
3297   2. LUCENE-1202: Small fixes to the way Clover is used to work better
3298      with contribs.  Of particular note: a single clover db is used
3299      regardless of whether tests are run globally or in the specific
3300      contrib directories.
3301
3302   3. LUCENE-1353: Javacc target in contrib/miscellaneous for
3303      generating the precedence query parser.
3304
3305 Test Cases
3306
3307  1. LUCENE-1238: Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded.
3308     Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped
3309     collector to collect also the last doc, after allowed-tTime passed. (Doron Cohen)
3310
3311  2. LUCENE-1348: relax TestTimeLimitedCollector to not fail due to
3312     timeout exceeded (just because test machine is very busy).
3313
3314 ======================= Release 2.3.2 =======================
3315
3316 Bug fixes
3317
3318  1. LUCENE-1191: On hitting OutOfMemoryError in any index-modifying
3319     methods in IndexWriter, do not commit any further changes to the
3320     index to prevent risk of possible corruption.  (Mike McCandless)
3321
3322  2. LUCENE-1197: Fixed issue whereby IndexWriter would flush by RAM
3323     too early when TermVectors were in use.  (Mike McCandless)
3324
3325  3. LUCENE-1198: Don't corrupt index if an exception happens inside
3326     DocumentsWriter.init (Mike McCandless)
3327
3328  4. LUCENE-1199: Added defensive check for null indexReader before
3329     calling close in IndexModifier.close() (Mike McCandless)
3330
3331  5. LUCENE-1200: Fix rare deadlock case in addIndexes* when
3332     ConcurrentMergeScheduler is in use (Mike McCandless)
3333
3334  6. LUCENE-1208: Fix deadlock case on hitting an exception while
3335     processing a document that had triggered a flush (Mike McCandless)
3336
3337  7. LUCENE-1210: Fix deadlock case on hitting an exception while
3338     starting a merge when using ConcurrentMergeScheduler (Mike McCandless)
3339
3340  8. LUCENE-1222: Fix IndexWriter.doAfterFlush to always be called on
3341     flush (Mark Ferguson via Mike McCandless)
3342
3343  9. LUCENE-1226: Fixed IndexWriter.addIndexes(IndexReader[]) to commit
3344     successfully created compound files. (Michael Busch)
3345
3346 10. LUCENE-1150: Re-expose StandardTokenizer's constants publicly;
3347     this was accidentally lost with LUCENE-966.  (Nicolas Lalevée via
3348     Mike McCandless)
3349
3350 11. LUCENE-1262: Fixed bug in BufferedIndexReader.refill whereby on
3351     hitting an exception in readInternal, the buffer is incorrectly
3352     filled with stale bytes such that subsequent calls to readByte()
3353     return incorrect results.  (Trejkaz via Mike McCandless)
3354
3355 12. LUCENE-1270: Fixed intermittent case where IndexWriter.close()
3356     would hang after IndexWriter.addIndexesNoOptimize had been
3357     called.  (Stu Hood via Mike McCandless)
3358
3359 Build
3360
3361  1. LUCENE-1230: Include *pom.xml* in source release files. (Michael Busch)
3362
3363
3364 ======================= Release 2.3.1 =======================
3365
3366 Bug fixes
3367
3368  1. LUCENE-1168: Fixed corruption cases when autoCommit=false and
3369     documents have mixed term vectors (Suresh Guvvala via Mike
3370     McCandless).
3371
3372  2. LUCENE-1171: Fixed some cases where OOM errors could cause
3373     deadlock in IndexWriter (Mike McCandless).
3374
3375  3. LUCENE-1173: Fixed corruption case when autoCommit=false and bulk
3376     merging of stored fields is used (Yonik via Mike McCandless).
3377
3378  4. LUCENE-1163: Fixed bug in CharArraySet.contains(char[] buffer, int
3379     offset, int len) that was ignoring offset and thus giving the
3380     wrong answer.  (Thomas Peuss via Mike McCandless)
3381
3382  5. LUCENE-1177: Fix rare case where IndexWriter.optimize might do too
3383     many merges at the end.  (Mike McCandless)
3384
3385  6. LUCENE-1176: Fix corruption case when documents with no term
3386     vector fields are added before documents with term vector fields.
3387     (Mike McCandless)
3388
3389  7. LUCENE-1179: Fixed assert statement that was incorrectly
3390     preventing Fields with empty-string field name from working.
3391     (Sergey Kabashnyuk via Mike McCandless)
3392
3393 ======================= Release 2.3.0 =======================
3394
3395 Changes in runtime behavior
3396
3397  1. LUCENE-994: Defaults for IndexWriter have been changed to maximize
3398     out-of-the-box indexing speed.  First, IndexWriter now flushes by
3399     RAM usage (16 MB by default) instead of a fixed doc count (call
3400     IndexWriter.setMaxBufferedDocs to get backwards compatible
3401     behavior).  Second, ConcurrentMergeScheduler is used to run merges
3402     using background threads (call IndexWriter.setMergeScheduler(new
3403     SerialMergeScheduler()) to get backwards compatible behavior).
3404     Third, merges are chosen based on size in bytes of each segment
3405     rather than document count of each segment (call
3406     IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get
3407     backwards compatible behavior).
3408
3409     NOTE: users of ParallelReader must change back all of these
3410     defaults in order to ensure the docIDs "align" across all parallel
3411     indices.
3412
3413     (Mike McCandless)
3414
3415  2. LUCENE-1045: SortField.AUTO didn't work with long. When detecting
3416     the field type for sorting automatically, numbers used to be
3417     interpreted as int, then as float, if parsing the number as an int
3418     failed. Now the detection checks for int, then for long,
3419     then for float. (Daniel Naber)
3420
3421 API Changes
3422
3423  1. LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have
3424     IndexWriter flush whenever the buffered documents are using more
3425     than the specified amount of RAM.  Also added new APIs to Token
3426     that allow one to set a char[] plus offset and length to specify a
3427     token (to avoid creating a new String() for each Token).  (Mike
3428     McCandless)
3429
3430  2. LUCENE-963: Add setters to Field to allow for re-using a single
3431     Field instance during indexing.  This is a sizable performance
3432     gain, especially for small documents.  (Mike McCandless)
3433
3434  3. LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
3435     permit re-using of Token and TokenStream instances during
3436     indexing.  Changed Token to use a char[] as the store for the
3437     termText instead of String.  This gives faster tokenization
3438     performance (~10-15%).  (Mike McCandless)
3439
3440  4. LUCENE-847: Factored MergePolicy, which determines which merges
3441     should take place and when, as well as MergeScheduler, which
3442     determines when the selected merges should actually run, out of
3443     IndexWriter.  The default merge policy is now
3444     LogByteSizeMergePolicy (see LUCENE-845) and the default merge
3445     scheduler is now ConcurrentMergeScheduler (see
3446     LUCENE-870). (Steven Parkes via Mike McCandless)
3447
3448  5. LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method
3449     that allows you to reduce memory usage of the termInfos by further
3450     sub-sampling (over the termIndexInterval that was used during
3451     indexing) which terms are loaded into memory.  (Chuck Williams,
3452     Doug Cutting via Mike McCandless)
3453
3454  6. LUCENE-743: Add IndexReader.reopen() method that re-opens an
3455     existing IndexReader (see New features -> 8.) (Michael Busch)
3456
3457  7. LUCENE-1062: Add setData(byte[] data),
3458     setData(byte[] data, int offset, int length), getData(), getOffset()
3459     and clone() methods to o.a.l.index.Payload. Also add the field name
3460     as arg to Similarity.scorePayload(). (Michael Busch)
3461
3462  8. LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to
3463     "partially optimize" an index down to maxNumSegments segments.
3464     (Mike McCandless)
3465
3466  9. LUCENE-1080: Changed Token.DEFAULT_TYPE to be public.
3467
3468 10. LUCENE-1064: Changed TopDocs constructor to be public.
3469      (Shai Erera via Michael Busch)
3470
3471 11. LUCENE-1079: DocValues cleanup: constructor now has no params,
3472     and getInnerArray() now throws UnsupportedOperationException (Doron Cohen)
3473
3474 12. LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns
3475     the Object (if any) that was bumped from the queue to allow
3476     re-use.  (Shai Erera via Mike McCandless)
3477
3478 13. LUCENE-1101: Token reuse 'contract' (defined LUCENE-969)
3479     modified so it is token producer's responsibility
3480     to call Token.clear(). (Doron Cohen)
3481
3482 14. LUCENE-1118: Changed StandardAnalyzer to skip too-long (default >
3483     255 characters) tokens.  You can increase this limit by calling
3484     StandardAnalyzer.setMaxTokenLength(...).  (Michael McCandless)
3485
3486
3487 Bug fixes
3488
3489  1. LUCENE-933: QueryParser fixed to not produce empty sub
3490     BooleanQueries "()" even if the Analyzer produced no
3491     tokens for input. (Doron Cohen)
3492
3493  2. LUCENE-955: Fixed SegmentTermPositions to work correctly with the
3494     first term in the dictionary. (Michael Busch)
3495
3496  3. LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader
3497     that was thrown after a call of TermPositions.seek().
3498     (Rich Johnson via Michael Busch)
3499
3500  4. LUCENE-938: Fixed cases where an unhandled exception in
3501     IndexWriter's methods could cause deletes to be lost.
3502     (Steven Parkes via Mike McCandless)
3503
3504  5. LUCENE-962: Fixed case where an unhandled exception in
3505     IndexWriter.addDocument or IndexWriter.updateDocument could cause
3506     unreferenced files in the index to not be deleted
3507     (Steven Parkes via Mike McCandless)
3508
3509  6. LUCENE-957: RAMDirectory fixed to properly handle directories
3510     larger than Integer.MAX_VALUE. (Doron Cohen)
3511
3512  7. LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(),
3513     isOptimized() or getVersion() is called. Separated MultiReader
3514     into two classes: MultiSegmentReader extends IndexReader, is
3515     package-protected and is created automatically by IndexReader.open()
3516     in case the index has multiple segments. The public MultiReader
3517     now extends MultiSegmentReader and is intended to be used by users
3518     who want to add their own subreaders. (Daniel Naber, Michael Busch)
3519
3520  8. LUCENE-970: FilterIndexReader now implements isOptimized(). Before
3521     a call of isOptimized() would throw a NPE. (Michael Busch)
3522
3523  9. LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(),
3524     isOptimized() or getVersion() is called. (Michael Busch)
3525
3526 10. LUCENE-948: Fix FNFE exception caused by stale NFS client
3527     directory listing caches when writers on different machines are
3528     sharing an index over NFS and using a custom deletion policy (Mike
3529     McCandless)
3530
3531 11. LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader
3532     close any streams they had opened if an exception is hit in the
3533     constructor.  (Ning Li via Mike McCandless)
3534
3535 12. LUCENE-985: If an extremely long term is in a doc (> 16383 chars),
3536     we now throw an IllegalArgumentException saying the term is too
3537     long, instead of cryptic ArrayIndexOutOfBoundsException.  (Karl
3538     Wettin via Mike McCandless)
3539
3540 13. LUCENE-991: The explain() method of BoostingTermQuery had errors
3541     when no payloads were present on a document.  (Peter Keegan via
3542     Grant Ingersoll)
3543
3544 14. LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again
3545     (this was broken by LUCENE-843).  (Ning Li via Mike McCandless)
3546
3547 15. LUCENE-1008: Fixed corruption case when document with no term
3548     vector fields is added after documents with term vector fields.
3549     This bug was introduced with LUCENE-843.  (Grant Ingersoll via
3550     Mike McCandless)
3551
3552 16. LUCENE-1006: Fixed QueryParser to accept a "" field value (zero
3553     length quoted string.)  (yonik)
3554
3555 17. LUCENE-1010: Fixed corruption case when document with no term
3556     vector fields is added after documents with term vector fields.
3557     This case is hit during merge and would cause an EOFException.
3558     This bug was introduced with LUCENE-984.  (Andi Vajda via Mike
3559     McCandless)
3560
3561 19. LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when
3562     autoCommit=false and documents are using stored fields and/or term
3563     vectors.  (Mark Miller via Mike McCandless)
3564
3565 20. LUCENE-1011: Fixed corruption case when two or more machines,
3566     sharing an index over NFS, can be writers in quick succession.
3567     (Patrick Kimber via Mike McCandless)
3568
3569 21. LUCENE-1028: Fixed Weight serialization for few queries:
3570     DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery.
3571     Serialization check added for all queries.
3572     (Kyle Maxwell via Doron Cohen)
3573
3574 22. LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the
3575     timeout argument is very large (eg Long.MAX_VALUE).  Also added
3576     Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout.  (Nikolay
3577     Diakov via Mike McCandless)
3578
3579 23. LUCENE-1050: Throw LockReleaseFailedException in
3580     Simple/NativeFSLockFactory if we fail to delete the lock file when
3581     releasing the lock.  (Nikolay Diakov via Mike McCandless)
3582
3583 24. LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in
3584     the merged segment. (Michael Busch)
3585
3586 25. LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent
3587     with other getTermFreqVector calls.  Also removed the throwing of the other IOException in that method to be consistent.  (Karl Wettin via Grant Ingersoll)
3588
3589 26. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted
3590     along with iterating the hits. Deleting docs already retrieved
3591     now works seamlessly. If docs not yet retrieved are deleted
3592     (e.g. from another thread), and then, relying on the initial
3593     Hits.length(), an application attempts to retrieve more hits
3594     than actually exist , a ConcurrentMidificationException
3595     is thrown.  (Doron Cohen)
3596
3597 27. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking
3598   the type of some tokens incorrectly.  This is done by adding a new flag named
3599   replaceInvalidAcronym which defaults to false, the current, incorrect behavior.  Setting
3600   this flag to true fixes the problem.  This flag is a temporary fix and is already
3601   marked as being deprecated.  3.x will implement the correct approach.  (Shai Erera via Grant Ingersoll)
3602   LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)
3603
3604 28. LUCENE-749: ChainedFilter behavior fixed when logic of
3605     first filter is ANDNOT.  (Antonio Bruno via Doron Cohen)
3606
3607 29. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last
3608     term) after next() returns false.  (Steven Tamm via Mike
3609     McCandless)
3610
3611
3612 New features
3613
3614  1. LUCENE-906: Elision filter for French.
3615     (Mathieu Lecarme via Otis Gospodnetic)
3616
3617  2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for
3618     not only filtering, but knowing where in a Document a Filter matches
3619     (Grant Ingersoll)
3620
3621  3. LUCENE-868: Added new Term Vector access features.  New callback
3622     mechanism allows application to define how and where to read Term
3623     Vectors from disk. This implementation contains several extensions
3624     of the new abstract TermVectorMapper class.  The new API should be
3625     back-compatible.  No changes in the actual storage of Term Vectors
3626     has taken place.
3627  3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper
3628      to provide information about what document is being accessed.
3629      (Karl Wettin via Grant Ingersoll)
3630
3631  4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for
3632     position based lookup of term vector information.
3633     See item #3 above (LUCENE-868).
3634
3635  5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store)
3636     to verify that locking is working properly.  LockVerifyServer runs
3637     a separate server to verify locks.  LockStressTest runs a simple
3638     tool that rapidly obtains and releases locks.
3639     VerifyingLockFactory is a LockFactory that wraps any other
3640     LockFactory and consults the LockVerifyServer whenever a lock is
3641     obtained or released, throwing an exception if an illegal lock
3642     obtain occurred.  (Patrick Kimber via Mike McCandless)
3643
3644  6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to
3645     support doubles and longs.  Added support into SortField for sorting
3646     on doubles and longs as well.  (Grant Ingersoll)
3647
3648  7. LUCENE-1020: Created basic index checking & repair tool
3649     (o.a.l.index.CheckIndex).  When run without -fix it does a
3650     detailed test of all segments in the index and reports summary
3651     information and any errors it hit.  With -fix it will remove
3652     segments that had errors.  (Mike McCandless)
3653
3654  8. LUCENE-743: Add IndexReader.reopen() method that re-opens an
3655     existing IndexReader by only loading those portions of an index
3656     that have changed since the reader was (re)opened. reopen() can
3657     be significantly faster than open(), depending on the amount of
3658     index changes. SegmentReader, MultiSegmentReader, MultiReader,
3659     and ParallelReader implement reopen(). (Michael Busch)
3660
3661  9. LUCENE-1040: CharArraySet useful for efficiently checking
3662     set membership of text specified by char[]. (yonik)
3663
3664 10. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a
3665     live backup of an index without pausing indexing.  (Mike
3666     McCandless)
3667
3668 11. LUCENE-1019: CustomScoreQuery enhanced to support multiple
3669     ValueSource queries. (Kyle Maxwell via Doron Cohen)
3670
3671 12. LUCENE-1095: Added an option to StopFilter to increase
3672     positionIncrement of the token succeeding a stopped token.
3673     Disabled by default. Similar option added to QueryParser
3674     to consider token positions when creating PhraseQuery
3675     and MultiPhraseQuery. Disabled by default (so by default
3676     the query parser ignores position increments).
3677     (Doron Cohen)
3678
3679 13. LUCENE-1380: Added TokenFilter for setting position increment in special cases related to the ShingleFilter (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll)
3680
3681
3682
3683 Optimizations
3684
3685  1. LUCENE-937: CachingTokenFilter now uses an iterator to access the
3686     Tokens that are cached in the LinkedList. This increases performance
3687     significantly, especially when the number of Tokens is large.
3688     (Mark Miller via Michael Busch)
3689
3690  2. LUCENE-843: Substantial optimizations to improve how IndexWriter
3691     uses RAM for buffering documents and to speed up indexing (2X-8X
3692     faster).  A single shared hash table now records the in-memory
3693     postings per unique term and is directly flushed into a single
3694     segment.  (Mike McCandless)
3695
3696  3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes
3697     takes place when using compound files.  (Mike McCandless)
3698
3699  4. LUCENE-959: Remove synchronization in Document (yonik)
3700
3701  5. LUCENE-963: Add setters to Field to allow for re-using a single
3702     Field instance during indexing.  This is a sizable performance
3703     gain, especially for small documents.  (Mike McCandless)
3704
3705  6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos
3706     and don't rely on exceptions. (Michael Busch)
3707
3708  7. LUCENE-966: Very substantial speedups (~6X faster) for
3709     StandardTokenizer (StandardAnalyzer) by using JFlex instead of
3710     JavaCC to generate the tokenizer.
3711     (Stanislaw Osinski via Mike McCandless)
3712
3713  8. LUCENE-969: Changed core tokenizers & filters to re-use Token and
3714     TokenStream instances when possible to improve tokenization
3715     performance (~10-15%). (Mike McCandless)
3716
3717  9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike
3718     McCandless)
3719
3720 10. LUCENE-986: Refactored SegmentInfos from IndexReader into the new
3721     subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader
3722     now extend DirectoryIndexReader and are the only IndexReader
3723     implementations that use SegmentInfos to access an index and
3724     acquire a write lock for index modifications. (Michael Busch)
3725
3726 11. LUCENE-1007: Allow flushing in IndexWriter to be triggered by
3727     either RAM usage or document count or both (whichever comes
3728     first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable
3729     one of the flush triggers.  (Ning Li via Mike McCandless)
3730
3731 12. LUCENE-1043: Speed up merging of stored fields by bulk-copying the
3732     raw bytes for each contiguous range of non-deleted documents.
3733     (Robert Engels via Mike McCandless)
3734
3735 13. LUCENE-693: Speed up nested conjunctions (~2x) that match many
3736     documents, and a slight performance increase for top level
3737     conjunctions.  (yonik)
3738
3739 14. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static
3740     and final. (Nathan Beyer via Michael Busch)
3741
3742 Documentation
3743
3744  1. LUCENE-1051: Generate separate javadocs for core, demo and contrib
3745     classes, as well as an unified view. Also add an appropriate menu
3746     structure to the website. (Michael Busch)
3747
3748  2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery.
3749     (Ronnie Kolehmainen via Michael Busch)
3750
3751 Build
3752
3753  1. LUCENE-908: Improvements and simplifications for how the MANIFEST
3754     file and the META-INF dir are created. (Michael Busch)
3755
3756  2. LUCENE-935: Various improvements for the maven artifacts. Now the
3757     artifacts also include the sources as .jar files. (Michael Busch)
3758
3759  3. Added apply-patch target to top-level build.  Defaults to looking for
3760     a patch in ${basedir}/../patches with name specified by -Dpatch.name.
3761     Can also specify any location by -Dpatch.file property on the command
3762     line.  This should be helpful for easy application of patches, but it
3763     is also a step towards integrating automatic patch application with
3764     JIRA and Hudson, and is thus subject to change.  (Grant Ingersoll)
3765
3766  4. LUCENE-935: Defined property "m2.repository.url" to allow setting
3767     the url to a maven remote repository to deploy to. (Michael Busch)
3768
3769  5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)
3770
3771  6. LUCENE-1055: Remove gdata-server from build files and its sources
3772     from trunk. (Michael Busch)
3773
3774  7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository
3775     via scp and ssh authentication. (Michael Busch)
3776
3777  8. LUCENE-1123: Allow overriding the specification version for
3778     MANIFEST.MF (Michael Busch)
3779
3780 Test Cases
3781
3782  1. LUCENE-766: Test adding two fields with the same name but different
3783     term vector setting.  (Nicolas Lalevée via Doron Cohen)
3784
3785 ======================= Release 2.2.0 =======================
3786
3787 Changes in runtime behavior
3788
3789 API Changes
3790
3791  1. LUCENE-793: created new exceptions and added them to throws clause
3792     for many methods (all subclasses of IOException for backwards
3793     compatibility): index.StaleReaderException,
3794     index.CorruptIndexException, store.LockObtainFailedException.
3795     This was done to better call out the possible root causes of an
3796     IOException from these methods.  (Mike McCandless)
3797
3798  2. LUCENE-811: make SegmentInfos class, plus a few methods from related
3799     classes, package-private again (they were unnecessarily made public
3800     as part of LUCENE-701).  (Mike McCandless)
3801
3802  3. LUCENE-710: added optional autoCommit boolean to IndexWriter
3803     constructors.  When this is false, index changes are not committed
3804     until the writer is closed.  This gives explicit control over when
3805     a reader will see the changes.  Also added optional custom
3806     deletion policy to explicitly control when prior commits are
3807     removed from the index.  This is intended to allow applications to
3808     share an index over NFS by customizing when prior commits are
3809     deleted. (Mike McCandless)
3810
3811  4. LUCENE-818: changed most public methods of IndexWriter,
3812     IndexReader (and its subclasses), FieldsReader and RAMDirectory to
3813     throw AlreadyClosedException if they are accessed after being
3814     closed.  (Mike McCandless)
3815
3816  5. LUCENE-834: Changed some access levels for certain Span classes to allow them
3817     to be overridden.  They have been marked expert only and not for public
3818     consumption. (Grant Ingersoll)
3819
3820  6. LUCENE-796: Removed calls to super.* from various get*Query methods in
3821     MultiFieldQueryParser, in order to allow sub-classes to override them.
3822     (Steven Parkes via Otis Gospodnetic)
3823
3824  7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
3825     in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
3826     combination when caching is desired.
3827     (Chris Hostetter, Otis Gospodnetic)
3828
3829  8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
3830     to enable extensibility of these classes. (Michael Busch)
3831
3832  9. LUCENE-580: Added the public method reset() to TokenStream. This method does
3833     nothing by default, but may be overwritten by subclasses to support consuming
3834     the TokenStream more than once. (Michael Busch)
3835
3836 10. LUCENE-580: Added a new constructor to Field that takes a TokenStream as
3837     argument, available as tokenStreamValue(). This is useful to avoid the need of
3838     "dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
3839
3840 11. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
3841     getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
3842     getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
3843     improves performance for certain queries but results in scoring out of docid
3844     order. This patch reverse this change, so now by default hit docs are scored
3845     in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
3846     This patch also enables the tests in QueryUtils again that check for docid
3847     order. (Paul Elschot, Doron Cohen, Michael Busch)
3848
3849 12. LUCENE-888: Added Directory.openInput(File path, int bufferSize)
3850     to optionally specify the size of the read buffer.  Also added
3851     BufferedIndexInput.setBufferSize(int) to change the buffer size.
3852     (Mike McCandless)
3853
3854 13. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need
3855     to be public because it implements the public interface TermPositionVector.
3856     (Michael Busch)
3857
3858 Bug fixes
3859
3860  1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist.  (Doron Cohen)
3861
3862  2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
3863     Query parser modified to create a prefix query only for the case
3864     that there is a single trailing wildcard (and no additional wildcard
3865     or '?' in the query text).  (Doron Cohen)
3866
3867  3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory
3868     and SimpleFSLockFactory.  This enables all 4 builtin LockFactory
3869     implementations to be specified via the System property
3870     org.apache.lucene.store.FSDirectoryLockFactoryClass.  (Mike McCandless)
3871
3872  4. LUCENE-821: The new single-norm-file introduced by LUCENE-756
3873     failed to reduce the number of open descriptors since it was still
3874     opened once per field with norms. (yonik)
3875
3876  5. LUCENE-823: Make sure internal file handles are closed when
3877     hitting an exception (eg disk full) while flushing deletes in
3878     IndexWriter's mergeSegments, and also during
3879     IndexWriter.addIndexes.  (Mike McCandless)
3880
3881  6. LUCENE-825: If directory is removed after
3882     FSDirectory.getDirectory() but before IndexReader.open you now get
3883     a FileNotFoundException like Lucene pre-2.1 (before this fix you
3884     got an NPE).  (Mike McCandless)
3885
3886  7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
3887     because the backslash is the escape character. Also changed the ESCAPED_CHAR
3888     list to contain all possible characters, because every character that
3889     follows a backslash should be considered as escaped. (Michael Busch)
3890
3891  8. LUCENE-372: QueryParser.parse() now ensures that the entire input string
3892     is consumed. Now a ParseException is thrown if a query contains too many
3893     closing parentheses. (Andreas Neumann via Michael Busch)
3894
3895  9. LUCENE-814: javacc build targets now fix line-end-style of generated files.
3896     Now also deleting all javacc generated files before calling javacc.
3897     (Steven Parkes, Doron Cohen)
3898
3899 10. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
3900
3901 11. LUCENE-828: Minor fix for Term's equal().
3902     (Paul Cowan via Otis Gospodnetic)
3903
3904 12. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
3905     and you call addIndexes, and hit an exception (eg disk full) then
3906     when IndexWriter rolls back its internal state this could corrupt
3907     the instance of IndexWriter (but, not the index itself) by
3908     referencing already deleted segments.  This bug was only present
3909     in 2.2 (trunk), ie was never released.  (Mike McCandless)
3910
3911 13. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
3912     For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
3913
3914 14. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
3915     by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
3916     Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
3917     was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
3918     designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
3919
3920 15. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
3921     has written the postings. Then the resources associated with the
3922     TokenStreams can safely be released. (Michael Busch)
3923
3924 16. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()
3925     won't insert terms twice anymore. (Daniel Naber)
3926
3927 17. LUCENE-881: QueryParser.escape() now also escapes the characters
3928     '|' and '&' which are part of the queryparser syntax. (Michael Busch)
3929
3930 18. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
3931     anymore and ignored, but re-thrown. Some javadoc improvements.
3932     (Daniel Naber)
3933
3934 19. LUCENE-698: FilteredQuery now takes the query boost into account for
3935     scoring. (Michael Busch)
3936
3937 20. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
3938     enumeration. (Christian Mallwitz via Daniel Naber)
3939
3940 21. LUCENE-903: FilteredQuery explanation inaccuracy with boost.
3941     Explanation tests now "deep" check the explanation details.
3942     (Chris Hostetter, Doron Cohen)
3943
3944 22. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the
3945     skip target param and ends up at the first match.
3946     (Sudaakeran B. via Chris Hostetter & Doron Cohen)
3947
3948 23. LUCENE-913: Two consecutive score() calls return different
3949     scores for Boolean Queries. (Michael Busch, Doron Cohen)
3950
3951 24. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the
3952     box", again, by moving set/getMaxMergeDocs up from
3953     LogDocMergePolicy into LogMergePolicy.  This fixes the API
3954     breakage (non backwards compatible change) caused by LUCENE-994.
3955     (Yonik Seeley via Mike McCandless)
3956
3957 New features
3958
3959  1. LUCENE-759: Added two n-gram-producing TokenFilters.
3960     (Otis Gospodnetic)
3961
3962  2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with
3963     RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
3964
3965  3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
3966     These metadata are called Payloads. For every position of a Token one Payload in the form
3967     of a variable length byte array can be stored in the prox file.
3968     Remark: The APIs introduced with this feature are in experimental state and thus
3969             contain appropriate warnings in the javadocs.
3970     (Michael Busch)
3971
3972  4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the
3973     values of a payload (see #3 above.) (Grant Ingersoll)
3974
3975  5. LUCENE-834: Similarity has a new method for scoring payloads called
3976     scorePayloads that can be overridden to take advantage of payload
3977     storage (see #3 above)
3978
3979  6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
3980     implemented it in the appropriate places (Grant Ingersoll)
3981
3982  7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
3983     on the remote side of the RMI connection.
3984     (Matt Ericson via Otis Gospodnetic)
3985
3986  8. LUCENE-446: Added Solr's search.function for scores based on field
3987     values, plus CustomScoreQuery for simple score (post) customization.
3988     (Yonik Seeley, Doron Cohen)
3989
3990  9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more
3991     Fields such that the other Fields do not have to go through the whole Analysis process over again.  For instance, if you have two
3992     Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations
3993     between the two using the TeeTokenFilter and the SinkTokenizer.  See TeeSinkTokenTest.java for examples.
3994     (Grant Ingersoll, Michael Busch, Yonik Seeley)
3995
3996 Optimizations
3997
3998  1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
3999     when nextPosition() is called for the first time. This allows using instances
4000     of SegmentTermPositions instead of SegmentTermDocs without additional costs.
4001     (Michael Busch)
4002
4003  2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
4004     IndexOutput directly now. This avoids further buffering and thus avoids
4005     unnecessary array copies. (Michael Busch)
4006
4007  3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
4008     cases and possibly improve scoring performance.  Documents can now be
4009     delivered out-of-order as they are scored (e.g. to HitCollector).
4010     N.B. A bit of code had to be disabled in QueryUtils in order for
4011     TestBoolean2 test to keep passing.
4012     (Paul Elschot via Otis Gospodnetic)
4013
4014  4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
4015     them to keep the spell index small. (Daniel Naber)
4016
4017  5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
4018     Together with LUCENE-888 this will allow to adjust the buffer size
4019     dynamically. (Paul Elschot, Michael Busch)
4020
4021  6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
4022     BufferedIndexOutput.  Also increase buffer size in
4023     BufferedIndexInput, but only when used during merging.  Together,
4024     these increases yield 10-18% overall performance gain vs the
4025     previous 1K defaults.  (Mike McCandless)
4026
4027  7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
4028     up most queries that use skipTo(), especially on big indexes with large posting
4029     lists. For average AND queries the speedup is about 20%, for queries that
4030     contain very frequent and very unique terms the speedup can be over 80%.
4031     (Michael Busch)
4032
4033 Documentation
4034
4035  1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
4036     http://wiki.apache.org/lucene-java/   Updated the links in the docs and
4037     wherever else I found references.  (Grant Ingersoll, Joe Schaefer)
4038
4039  2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
4040     consistent with java.util.Comparator.compare(): Any integer is allowed to
4041     be returned instead of only -1/0/1.
4042     (Paul Cowan via Michael Busch)
4043
4044  3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
4045     Solved javadoc errors under jdk5 (jars in path for gdata).
4046     Made "javadocs" target depend on "build-contrib" for first downloading
4047     contrib jars configured for dynamic downloaded. (Note: when running
4048     behind firewall, a firewall prompt might pop up) (Doron Cohen)
4049
4050  4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
4051     remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
4052
4053  5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)
4054
4055  6. LUCENE-926: Added document package javadocs. (Grant Ingersoll)
4056
4057 Build
4058
4059  1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
4060     (Steven Parkes via Michael Busch)
4061
4062  2. LUCENE-885: "ant test" now includes all contrib tests.  The new
4063     "ant test-core" target can be used to run only the Core (non
4064     contrib) tests.
4065     (Chris Hostetter)
4066
4067  3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages).
4068     (Doron Cohen)
4069
4070  4. LUCENE-894: Add custom build file for binary distributions that includes
4071     targets to build the demos. (Chris Hostetter, Michael Busch)
4072
4073  5. LUCENE-904: The "package" targets in build.xml now also generate .md5
4074     checksum files. (Chris Hostetter, Michael Busch)
4075
4076  6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
4077     demo war, demo jar, and the contrib jars. (Michael Busch)
4078
4079  7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)
4080
4081  8. LUCENE-908: Improves content of MANIFEST file and makes it customizable
4082     for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
4083     jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
4084     (Chris Hostetter, Michael Busch)
4085
4086  9. LUCENE-930: Various contrib building improvements to ensure contrib
4087     dependencies are met, and test compilation errors fail the build.
4088     (Steven Parkes, Chris Hostetter)
4089
4090 10. LUCENE-622: Add ant target and pom.xml files for building maven artifacts
4091     of the Lucene core and the contrib modules.
4092     (Sami Siren, Karl Wettin, Michael Busch)
4093
4094 ======================= Release 2.1.0 =======================
4095
4096 Changes in runtime behavior
4097
4098  1. 's' and 't' have been removed from the list of default stopwords
4099     in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
4100     as a stopword meant that 's-class' led to the same results as 'class'.
4101     Note that this problem still exists for 'a', e.g. in 'a-class' as
4102     'a' continues to be a stopword.
4103     (Daniel Naber)
4104
4105  2. LUCENE-478: Updated the list of Unicode code point ranges for CJK
4106     (now split into CJ and K) in StandardAnalyzer.  (John Wang and
4107     Steven Rowe via Otis Gospodnetic)
4108
4109  3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
4110     and added a few more of them to increase CJK character coverage.
4111     Also documented some of the ranges.
4112     (Otis Gospodnetic)
4113
4114  4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
4115     QueryParser.  Default is to disallow them, as before.
4116     (Steven Parkes via Otis Gospodnetic)
4117
4118  5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
4119     for range queries. Added useOldRangeQuery property to QueryParser to allow
4120     selection of old RangeQuery class if required.
4121     (Mark Harwood)
4122
4123  6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
4124     does not contain a wildcard character (? or *), when previously a
4125     StringIndexOutOfBoundsException was thrown.
4126     (Michael Busch via Erik Hatcher)
4127
4128  7. LUCENE-726: Removed the use of deprecated doc.fields() method and
4129     Enumeration.
4130     (Michael Busch via Otis Gospodnetic)
4131
4132  8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
4133     and added a call to enumerators.remove() in TermInfosReader.close().
4134     The finalize() overrides were added to help with a pre-1.4.2 JVM bug
4135     that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
4136     (Otis Gospodnetic)
4137
4138  9. LUCENE-771: The default location of the write lock is now the
4139     index directory, and is named simply "write.lock" (without a big
4140     digest prefix).  The system properties "org.apache.lucene.lockDir"
4141     nor "java.io.tmpdir" are no longer used as the global directory
4142     for storing lock files, and the LOCK_DIR field of FSDirectory is
4143     now deprecated.  (Mike McCandless)
4144
4145 New features
4146
4147  1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
4148     (Samphan Raruenrom via Chris Hostetter)
4149
4150  2. LUCENE-545: New FieldSelector API and associated changes to
4151     IndexReader and implementations.  New Fieldable interface for use
4152     with the lazy field loading mechanism.  (Grant Ingersoll and Chuck
4153     Williams via Grant Ingersoll)
4154
4155  3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura
4156     Smolsky, Yonik Seeley)
4157
4158  4. LUCENE-678: Added NativeFSLockFactory, which implements locking
4159     using OS native locking (via java.nio.*).  (Michael McCandless via
4160     Yonik Seeley)
4161
4162  5. LUCENE-544: Added the ability to specify different boosts for
4163     different fields when using MultiFieldQueryParser (Matt Ericson
4164     via Otis Gospodnetic)
4165
4166  6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
4167     optimize the index when adding new segments, only performing
4168     merges as needed.  (Ning Li via Yonik Seeley)
4169
4170  7. LUCENE-573: QueryParser now allows backslash escaping in
4171     quoted terms and phrases. (Michael Busch via Yonik Seeley)
4172
4173  8. LUCENE-716: QueryParser now allows specification of Unicode
4174     characters in terms via a unicode escape of the form \uXXXX
4175     (Michael Busch via Yonik Seeley)
4176
4177  9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
4178     and IndexWriter.flushRamSegments(), allowing applications to
4179     control the amount of memory used to buffer documents.
4180     (Chuck Williams via Yonik Seeley)
4181
4182 10. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery
4183     (Yonik Seeley)
4184
4185 11. LUCENE-741: Command-line utility for modifying or removing norms
4186     on fields in an existing index.  This is mostly based on LUCENE-496
4187     and lives in contrib/miscellaneous.
4188     (Chris Hostetter, Otis Gospodnetic)
4189
4190 12. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
4191     their passing unit tests.
4192     (Otis Gospodnetic)
4193
4194 13. LUCENE-565: Added methods to IndexWriter to more efficiently
4195     handle updating documents (the "delete then add" use case).  This
4196     is intended to be an eventual replacement for the existing
4197     IndexModifier.  Added IndexWriter.flush() (renamed from
4198     flushRamSegments()) to flush all pending updates (held in RAM), to
4199     the Directory.  (Ning Li via Mike McCandless)
4200
4201 14. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
4202     which allow one to retrieve the size of a field without retrieving the
4203     actual field. (Chuck Williams via Grant Ingersoll)
4204
4205 15. LUCENE-799: Properly handle lazy, compressed fields.
4206     (Mike Klaas via Grant Ingersoll)
4207
4208 API Changes
4209
4210  1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow
4211     changing of termText via setTermText().  (Yonik Seeley)
4212
4213  2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
4214     and is supposed to be replaced with the WordlistLoader class in
4215     package org.apache.lucene.analysis (Daniel Naber)
4216
4217  3. LUCENE-609: Revert return type of Document.getField(s) to Field
4218     for backward compatibility, added new Document.getFieldable(s)
4219     for access to new lazy loaded fields. (Yonik Seeley)
4220
4221  4. LUCENE-608: Document.fields() has been deprecated and a new method
4222     Document.getFields() has been added that returns a List instead of
4223     an Enumeration (Daniel Naber)
4224
4225  5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
4226     subclass allows explain methods to produce Explanations which model
4227     "matching" independent of having a positive value.
4228     (Chris Hostetter)
4229
4230  6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
4231     and IndexWriter.setDefaultCommitLockTimeout for overriding default
4232     timeout values for all future instances of IndexWriter (as well
4233     as for any other classes that may reference the static values,
4234     ie: IndexReader).
4235     (Michael McCandless via Chris Hostetter)
4236
4237  7. LUCENE-638: FSDirectory.list() now only returns the directory's
4238     Lucene-related files. Thanks to this change one can now construct
4239     a RAMDirectory from a file system directory that contains files
4240     not related to Lucene.
4241     (Simon Willnauer via Daniel Naber)
4242
4243  8. LUCENE-635: Decoupling locking implementation from Directory
4244     implementation.  Added set/getLockFactory to Directory and moved
4245     all locking code into subclasses of abstract class LockFactory.
4246     FSDirectory and RAMDirectory still default to their prior locking
4247     implementations, but now you can mix & match, for example using
4248     SingleInstanceLockFactory (ie, in memory locking) locking with an
4249     FSDirectory.  Note that now you must call setDisableLocks before
4250     the instantiation a FSDirectory if you wish to disable locking
4251     for that Directory.
4252     (Michael McCandless, Jeff Patterson via Yonik Seeley)
4253
4254  9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
4255     (Steven Parkes via Otis Gospodnetic)
4256
4257 10. LUCENE-701: Lockless commits: a commit lock is no longer required
4258     when a writer commits and a reader opens the index.  This includes
4259     a change to the index file format (see docs/fileformats.html for
4260     details).  It also removes all APIs associated with the commit
4261     lock & its timeout.  Readers are now truly read-only and do not
4262     block one another on startup.  This is the first step to getting
4263     Lucene to work correctly over NFS (second step is
4264     LUCENE-710). (Mike McCandless)
4265
4266 11. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
4267     in Similarity's MoreLikeThis class. The misspelling has been
4268     replaced by the correct spelling.
4269     (Andi Vajda via Daniel Naber)
4270
4271 12. LUCENE-738: Reduce the size of the file that keeps track of which
4272     documents are deleted when the number of deleted documents is
4273     small.  This changes the index file format and cannot be
4274     read by previous versions of Lucene.  (Doron Cohen via Yonik Seeley)
4275
4276 13. LUCENE-756: Maintain all norms in a single .nrm file to reduce the
4277     number of open files and file descriptors for the non-compound index
4278     format.  This changes the index file format, but maintains the
4279     ability to read and update older indices. The first segment merge
4280     on an older format index will create a single .nrm file for the new
4281     segment.  (Doron Cohen via Yonik Seeley)
4282
4283 14. LUCENE-732: DateTools support has been added to QueryParser, with
4284     setters for both the default Resolution, and per-field Resolution.
4285     For backwards compatibility, DateField is still used if no Resolutions
4286     are specified. (Michael Busch via Chris Hostetter)
4287
4288 15. Added isOptimized() method to IndexReader.
4289     (Otis Gospodnetic)
4290
4291 16. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
4292     take a boolean "create" argument.  Instead you should use
4293     IndexWriter's "create" argument to create a new index.
4294     (Mike McCandless)
4295
4296 17. LUCENE-780: Add a static Directory.copy() method to copy files
4297     from one Directory to another.  (Jiri Kuhn via Mike McCandless)
4298
4299 18. LUCENE-773: Added Directory.clearLock(String name) to forcefully
4300     remove an old lock.  The default implementation is to ask the
4301     lockFactory (if non null) to clear the lock.  (Mike McCandless)
4302
4303 19. LUCENE-795: Directory.renameFile() has been deprecated as it is
4304     not used anymore inside Lucene.  (Daniel Naber)
4305
4306 Bug fixes
4307
4308  1. Fixed the web application demo (built with "ant war-demo") which
4309     didn't work because it used a QueryParser method that had
4310     been removed (Daniel Naber)
4311
4312  2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement
4313     (Yonik Seeley)
4314
4315  3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar
4316     (Karl Wettin via Yonik Seeley)
4317
4318  4. LUCENE-587: Explanation.toHtml was producing malformed HTML
4319     (Chris Hostetter)
4320
4321  5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
4322
4323  6. LUCENE-601: RAMDirectory and RAMFile made Serializable
4324     (Karl Wettin via Otis Gospodnetic)
4325
4326  7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
4327     Explanations match up with the real scores.
4328     (Chris Hostetter)
4329
4330  8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to
4331     new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
4332
4333  9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
4334     disambiguate inner class scorer's use of doc() in BooleanScorer2,
4335     other test code changes.  (DM Smith via Yonik Seeley)
4336
4337 10. LUCENE-451: All core query types now use ComplexExplanations so that
4338     boosts of zero don't confuse the BooleanWeight explain method.
4339     (Chris Hostetter)
4340
4341 11. LUCENE-593: Fixed LuceneDictionary's inner Iterator
4342     (Kåre Fiedler Christiansen via Otis Gospodnetic)
4343
4344 12. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
4345     (Daniel Naber)
4346
4347 13. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
4348     to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
4349
4350 14. LUCENE-650: Fixed NPE in Locale specific String Sort when Document
4351     has no value.
4352     (Oliver Hutchison via Chris Hostetter)
4353
4354 15. LUCENE-683: Fixed data corruption when reading lazy loaded fields.
4355     (Yonik Seeley)
4356
4357 16. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
4358     lock to be shared between different directories.
4359     (Michael McCandless via Yonik Seeley)
4360
4361 17. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.
4362     (Yonik Seeley)
4363
4364 18. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
4365     called on it before next().  (Yonik Seeley)
4366
4367 19. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
4368     to recognize ordered spans if they overlapped with unordered spans.
4369     (Paul Elschot via Chris Hostetter)
4370
4371 20. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
4372     in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
4373
4374 21. LUCENE-715: Fixed private constructor in IndexWriter.java to
4375     properly release the acquired write lock if there is an
4376     IOException after acquiring the write lock but before finishing
4377     instantiation. (Matthew Bogosian via Mike McCandless)
4378
4379 22. LUCENE-651: Multiple different threads requesting the same
4380     FieldCache entry (often for Sorting by a field) at the same
4381     time caused multiple generations of that entry, which was
4382     detrimental to performance and memory use.
4383     (Oliver Hutchison via Otis Gospodnetic)
4384
4385 23. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.
4386     (Doron Cohen via Otis Gospodnetic)
4387
4388 24. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
4389     classes from contrib/similarity, as their new home is under
4390     contrib/queries.
4391     (Otis Gospodnetic)
4392
4393 25. LUCENE-669: Do not double-close the RandomAccessFile in
4394     FSIndexInput/Output during finalize().  Besides sending an
4395     IOException up to the GC, this may also be the cause intermittent
4396     "The handle is invalid" IOExceptions on Windows when trying to
4397     close readers or writers. (Michael Busch via Mike McCandless)
4398
4399 26. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
4400     on any exceptions (eg disk full).  The semantics of these methods
4401     is now transactional: either all indices are merged or none are.
4402     Also fixed IndexWriter.mergeSegments (called outside of
4403     addIndexes(*) by addDocument, optimize, flushRamSegments) and
4404     IndexReader.commit() (called by close) to clean up and keep the
4405     instance state consistent to what's actually in the index (Mike
4406     McCandless).
4407
4408 27. LUCENE-129: Change finalizers to do "try {...} finally
4409     {super.finalize();}" to make sure we don't miss finalizers in
4410     classes above us. (Esmond Pitt via Mike McCandless)
4411
4412 28. LUCENE-754: Fix a problem introduced by LUCENE-651, causing
4413     IndexReaders to hang around forever, in addition to not
4414     fixing the original FieldCache performance problem.
4415     (Chris Hostetter, Yonik Seeley)
4416
4417 29. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
4418     correctly raise ArrayIndexOutOfBoundsException when docNum is too
4419     large.  Previously, if docNum was only slightly too large (within
4420     the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
4421     exception would be raised and instead the index would become
4422     silently corrupted.  The corruption then only appears much later,
4423     in mergeSegments, when the corrupted segment is merged with
4424     segment(s) after it. (Mike McCandless)
4425
4426 30. LUCENE-768: Fix case where an Exception during deleteDocument,
4427     undeleteAll or setNorm in IndexReader could leave the reader in a
4428     state where close() fails to release the write lock.
4429     (Mike McCandless)
4430
4431 31. Remove "tvp" from known index file extensions because it is
4432     never used. (Nicolas Lalevée via Bernhard Messer)
4433
4434 32. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
4435     rely on file length check and instead use the SegmentInfo's
4436     docCount that's already stored explicitly in the index.  This is a
4437     defensive bug fix (ie, there is no known problem seen "in real
4438     life" due to this, just a possible future problem).  (Chuck
4439     Williams via Mike McCandless)
4440
4441 Optimizations
4442
4443   1. LUCENE-586: TermDocs.skipTo() is now more efficient for
4444      multi-segment indexes.  This will improve the performance of many
4445      types of queries against a non-optimized index. (Andrew Hudson
4446      via Yonik Seeley)
4447
4448   2. LUCENE-623: RAMDirectory.close now nulls out its reference to all
4449      internal "files", allowing them to be GCed even if references to the
4450      RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
4451
4452   3. LUCENE-629: Compressed fields are no longer uncompressed and
4453      recompressed during segment merges (e.g. during indexing or
4454      optimizing), thus improving performance . (Michael Busch via Otis
4455      Gospodnetic)
4456
4457   4. LUCENE-388: Improve indexing performance when maxBufferedDocs is
4458      large by keeping a count of buffered documents rather than
4459      counting after each document addition.  (Doron Cohen, Paul Smith,
4460      Yonik Seeley)
4461
4462   5. Modified TermScorer.explain to use TermDocs.skipTo() instead of
4463      looping through docs. (Grant Ingersoll)
4464
4465   6. LUCENE-672: New indexing segment merge policy flushes all
4466      buffered docs to their own segment and delays a merge until
4467      mergeFactor segments of a certain level have been accumulated.
4468      This increases indexing performance in the presence of deleted
4469      docs or partially full segments as well as enabling future
4470      optimizations.
4471
4472      NOTE: this also fixes an "under-merging" bug whereby it is
4473      possible to get far too many segments in your index (which will
4474      drastically slow down search, risks exhausting file descriptor
4475      limit, etc.).  This can happen when the number of buffered docs
4476      at close, plus the number of docs in the last non-ram segment is
4477      greater than mergeFactor. (Ning Li, Yonik Seeley)
4478
4479   7. Lazy loaded fields unnecessarily retained an extra copy of loaded
4480      String data.  (Yonik Seeley)
4481
4482   8. LUCENE-443: ConjunctionScorer performance increase.  Speed up
4483      any BooleanQuery with more than one mandatory clause.
4484      (Abdul Chaudhry, Paul Elschot via Yonik Seeley)
4485
4486   9. LUCENE-365: DisjunctionSumScorer performance increase of
4487      ~30%. Speeds up queries with optional clauses. (Paul Elschot via
4488      Yonik Seeley)
4489
4490  10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
4491      size buffers, which will speed up merging and retrieving binary
4492      and compressed fields.  (Nadav Har'El via Yonik Seeley)
4493
4494  11. LUCENE-687: Lazy skipping on proximity file speeds up most
4495      queries involving term positions, including phrase queries.
4496      (Michael Busch via Yonik Seeley)
4497
4498  12. LUCENE-714: Replaced 2 cases of manual for-loop array copying
4499      with calls to System.arraycopy instead, in DocumentWriter.java.
4500      (Nicolas Lalevee via Mike McCandless)
4501
4502  13. LUCENE-729: Non-recursive skipTo and next implementation of
4503      TermDocs for a MultiReader.  The old implementation could
4504      recurse up to the number of segments in the index. (Yonik Seeley)
4505
4506  14. LUCENE-739: Improve segment merging performance by reusing
4507      the norm array across different fields and doing bulk writes
4508      of norms of segments with no deleted docs.
4509     (Michael Busch via Yonik Seeley)
4510
4511  15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
4512      to the List of clauses and replaced the internal synchronized Vector
4513      with an unsynchronized List. (Yonik Seeley)
4514
4515  16. LUCENE-750: Remove finalizers from FSIndexOutput and move the
4516      FSIndexInput finalizer to the actual file so all clones don't
4517      register a new finalizer. (Yonik Seeley)
4518
4519 Test Cases
4520
4521   1. Added TestTermScorer.java (Grant Ingersoll)
4522
4523   2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)
4524
4525   3. LUCENE-744 Append the user.name property onto the temporary directory
4526      that is created so it doesn't interfere with other users. (Grant Ingersoll)
4527
4528 Documentation
4529
4530   1. Added style sheet to xdocs named lucene.css and included in the
4531      Anakia VSL descriptor.  (Grant Ingersoll)
4532
4533   2. Added scoring.xml document into xdocs.  Updated Similarity.java
4534      scoring formula.(Grant Ingersoll and Steve Rowe.  Updates from:
4535      Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
4536      Issue 664.
4537
4538   3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
4539
4540   4. Moved xdocs directory to src/site/src/documentation/content/xdocs per
4541      Issue 707.  Site now builds using Forrest, just like the other Lucene
4542      siblings.  See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
4543      for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
4544      Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
4545
4546   5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll)
4547
4548   6. LUCENE-713 Updated the Term Vector section of File Formats to include
4549      documentation on how Offset and Position info are stored in the TVF file.
4550      (Grant Ingersoll, Samir Abdou)
4551
4552   7. Added in link to Clover Test Code Coverage Reports under the Develop
4553      section in Resources (Grant Ingersoll)
4554
4555   8. LUCENE-748: Added details for semantics of IndexWriter.close on
4556      hitting an Exception.  (Jed Wesley-Smith via Mike McCandless)
4557
4558   9. Added some text about what is contained in releases.
4559      (Eric Haszlakiewicz via Grant Ingersoll)
4560
4561   10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
4562       makes a full copy of the starting Directory.  (Mike McCandless)
4563
4564   11. LUCENE-764: Fix javadocs to detail temporary space requirements
4565       for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
4566       methods.  (Mike McCandless)
4567
4568 Build
4569
4570   1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721
4571      To enable clover code coverage, you must have clover.jar in the ANT
4572      classpath and specify -Drun.clover=true on the command line.
4573      (Michael Busch and Grant Ingersoll)
4574
4575   2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
4576      ${build.dir}/test just like the tempDir sysproperty.
4577
4578   3. LUCENE-757 Added new target named init-dist that does setup for
4579      distribution of both binary and source distributions.  Called by package
4580      and package-*-src
4581
4582 ======================= Release 2.0.0 =======================
4583
4584 API Changes
4585
4586  1. All deprecated methods and fields have been removed, except
4587     DateField, which will still be supported for some time
4588     so Lucene can read its date fields from old indexes
4589     (Yonik Seeley & Grant Ingersoll)
4590
4591  2. DisjunctionSumScorer is no longer public.
4592     (Paul Elschot via Otis Gospodnetic)
4593
4594  3. Creating a Field with both an empty name and an empty value
4595     now throws an IllegalArgumentException
4596     (Daniel Naber)
4597
4598  4. LUCENE-301: Added new IndexWriter({String,File,Directory},
4599     Analyzer) constructors that do not take a boolean "create"
4600     argument.  These new constructors will create a new index if
4601     necessary, else append to the existing one.  (Dan Armbrust via
4602     Mike McCandless)
4603
4604 New features
4605
4606  1. LUCENE-496: Command line tool for modifying the field norms of an
4607     existing index; added to contrib/miscellaneous.  (Chris Hostetter)
4608
4609  2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.
4610     (Chris Hostetter)
4611
4612 Bug fixes
4613
4614  1. LUCENE-330: Fix issue of FilteredQuery not working properly within
4615     BooleanQuery.  (Paul Elschot via Erik Hatcher)
4616
4617  2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
4618     with RemoteSearchable.  (Philippe Laflamme via Yonik Seeley)
4619
4620  3. Added methods to get/set writeLockTimeout and commitLockTimeout in
4621     IndexWriter. These could be set in Lucene 1.4 using a system property.
4622     This feature had been removed without adding the corresponding
4623     getter/setter methods.  (Daniel Naber)
4624
4625  4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
4626     when using SpanQueries. (Paul Elschot via Yonik Seeley)
4627
4628  5. Implemented FilterIndexReader.getVersion() and isCurrent()
4629     (Yonik Seeley)
4630
4631  6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
4632     that sometimes caused the index order of documents to change.
4633     (Yonik Seeley)
4634
4635  7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
4636     subsequent String sorts with different locales to sort identically.
4637     (Paul Cowan via Yonik Seeley)
4638
4639  8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery
4640     (Stefan Will via Yonik Seeley)
4641
4642  9. LUCENE-514: Added getTermArrays() and extractTerms() to
4643     MultiPhraseQuery (Eric Jain & Yonik Seeley)
4644
4645 10. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors
4646     (frederic via Yonik)
4647
4648 11. LUCENE-352: Fixed bug in SpanNotQuery that manifested as
4649     NullPointerException when "exclude" query was not a SpanTermQuery.
4650     (Chris Hostetter)
4651
4652 12. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
4653     (Chris Hostetter)
4654
4655 13. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
4656     didn't know about the field yet, reader didn't keep track if it had deletions,
4657     and deleteDocument calls could circumvent synchronization on the subreaders.
4658     (Chuck Williams via Yonik Seeley)
4659
4660 14. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
4661     ConstantScoreQuery in order to allow their use with a MultiSearcher.
4662     (Yonik Seeley)
4663
4664 15. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.
4665     (Peter Royal, Michael Chan, Yonik Seeley)
4666
4667 16. LUCENE-485: Don't hold commit lock while removing obsolete index
4668     files.  (Luc Vanlerberghe via cutting)
4669
4670
4671 1.9.1
4672
4673 Bug fixes
4674
4675  1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization
4676     introduced in 1.9-final.  (Shay Banon & Steven Tamm via cutting)
4677
4678 1.9 final
4679
4680 Note that this release is mostly but not 100% source compatible with
4681 the previous release of Lucene (1.4.3). In other words, you should
4682 make sure your application compiles with this version of Lucene before
4683 you replace the old Lucene JAR with the new one.  Many methods have
4684 been deprecated in anticipation of release 2.0, so deprecation
4685 warnings are to be expected when upgrading from 1.4.3 to 1.9.
4686
4687 Bug fixes
4688
4689  1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
4690     effects on indexing performance and has thus been reverted. The
4691     argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
4692     an exception is thrown. (Daniel Naber)
4693
4694 Optimizations
4695
4696  1. Optimized BufferedIndexOutput.writeBytes() to use
4697     System.arraycopy() in more cases, rather than copying byte-by-byte.
4698     (Lukas Zapletal via Cutting)
4699
4700 1.9 RC1
4701
4702 Requirements
4703
4704  1. To compile and use Lucene you now need Java 1.4 or later.
4705
4706 Changes in runtime behavior
4707
4708  1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
4709     FuzzyQuery expands to more than BooleanQuery.maxClauseCount
4710     terms only the BooleanQuery.maxClauseCount most similar terms
4711     go into the rewritten query and thus the exception is avoided.
4712     (Christoph)
4713
4714  2. Changed system property from "org.apache.lucene.lockdir" to
4715     "org.apache.lucene.lockDir", so that its casing follows the existing
4716     pattern used in other Lucene system properties. (Bernhard)
4717
4718  3. The terms of RangeQueries and FuzzyQueries are now converted to
4719     lowercase by default (as it has been the case for PrefixQueries
4720     and WildcardQueries before). Use setLowercaseExpandedTerms(false)
4721     to disable that behavior but note that this also affects
4722     PrefixQueries and WildcardQueries. (Daniel Naber)
4723
4724  4. Document frequency that is computed when MultiSearcher is used is now
4725     computed correctly and "globally" across subsearchers and indices, while
4726     before it used to be computed locally to each index, which caused
4727     ranking across multiple indices not to be equivalent.
4728     (Chuck Williams, Wolf Siberski via Otis, bug #31841)
4729
4730  5. When opening an IndexWriter with create=true, Lucene now only deletes
4731     its own files from the index directory (looking at the file name suffixes
4732     to decide if a file belongs to Lucene). The old behavior was to delete
4733     all files. (Daniel Naber and Bernhard Messer, bug #34695)
4734
4735  6. The version of an IndexReader, as returned by getCurrentVersion()
4736     and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
4737     is now initialized by the system time in milliseconds.
4738     (Bernhard Messer via Daniel Naber)
4739
4740  7. Several default values cannot be set via system properties anymore, as
4741     this has been considered inappropriate for a library like Lucene. For
4742     most properties there are set/get methods available in IndexWriter which
4743     you should use instead. This affects the following properties:
4744     See IndexWriter for getter/setter methods:
4745       org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
4746       org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
4747       org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
4748       org.apache.lucene.mergeFactor,
4749     See BooleanQuery for getter/setter methods:
4750       org.apache.lucene.maxClauseCount
4751     See FSDirectory for getter/setter methods:
4752       disableLuceneLocks
4753     (Daniel Naber)
4754
4755  8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
4756     instead of using Integer and Float classes for parsing.
4757     (Yonik Seeley via Otis Gospodnetic)
4758
4759  9. Expert level search routines returning TopDocs and TopFieldDocs
4760     no longer normalize scores.  This also fixes bugs related to
4761     MultiSearchers and score sorting/normalization.
4762     (Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
4763
4764 New features
4765
4766  1. Added support for stored compressed fields (patch #31149)
4767     (Bernhard Messer via Christoph)
4768
4769  2. Added support for binary stored fields (patch #29370)
4770     (Drew Farris and Bernhard Messer via Christoph)
4771
4772  3. Added support for position and offset information in term vectors
4773     (patch #18927). (Grant Ingersoll & Christoph)
4774
4775  4. A new class DateTools has been added. It allows you to format dates
4776     in a readable format adequate for indexing. Unlike the existing
4777     DateField class DateTools can cope with dates before 1970 and it
4778     forces you to specify the desired date resolution (e.g. month, day,
4779     second, ...) which can make RangeQuerys on those fields more efficient.
4780     (Daniel Naber)
4781
4782  5. QueryParser now correctly works with Analyzers that can return more
4783     than one token per position. For example, a query "+fast +car"
4784     would be parsed as "+fast +(car automobile)" if the Analyzer
4785     returns "car" and "automobile" at the same position whenever it
4786     finds "car" (Patch #23307).
4787     (Pierrick Brihaye, Daniel Naber)
4788
4789  6. Permit unbuffered Directory implementations (e.g., using mmap).
4790     InputStream is replaced by the new classes IndexInput and
4791     BufferedIndexInput.  OutputStream is replaced by the new classes
4792     IndexOutput and BufferedIndexOutput.  InputStream and OutputStream
4793     are now deprecated and FSDirectory is now subclassable. (cutting)
4794
4795  7. Add native Directory and TermDocs implementations that work under
4796     GCJ.  These require GCC 3.4.0 or later and have only been tested
4797     on Linux.  Use 'ant gcj' to build demo applications. (cutting)
4798
4799  8. Add MMapDirectory, which uses nio to mmap input files.  This is
4800     still somewhat slower than FSDirectory.  However it uses less
4801     memory per query term, since a new buffer is not allocated per
4802     term, which may help applications which use, e.g., wildcard
4803     queries.  It may also someday be faster. (cutting & Paul Elschot)
4804
4805  9. Added javadocs-internal to build.xml - bug #30360
4806     (Paul Elschot via Otis)
4807
4808 10. Added RangeFilter, a more generically useful filter than DateFilter.
4809     (Chris M Hostetter via Erik)
4810
4811 11. Added NumberTools, a utility class indexing numeric fields.
4812     (adapted from code contributed by Matt Quail; committed by Erik)
4813
4814 12. Added public static IndexReader.main(String[] args) method.
4815     IndexReader can now be used directly at command line level
4816     to list and optionally extract the individual files from an existing
4817     compound index file.
4818     (adapted from code contributed by Garrett Rooney; committed by Bernhard)
4819
4820 13. Add IndexWriter.setTermIndexInterval() method.  See javadocs.
4821     (Doug Cutting)
4822
4823 14. Added LucenePackage, whose static get() method returns java.util.Package,
4824     which lets the caller get the Lucene version information specified in
4825     the Lucene Jar.
4826     (Doug Cutting via Otis)
4827
4828 15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
4829     This provides standard java.util.Iterator iteration over Hits.
4830     Each call to the iterator's next() method returns a Hit object.
4831     (Jeremy Rayner via Erik)
4832
4833 16. Add ParallelReader, an IndexReader that combines separate indexes
4834     over different fields into a single virtual index.  (Doug Cutting)
4835
4836 17. Add IntParser and FloatParser interfaces to FieldCache, so that
4837     fields in arbitrarily formats can be cached as ints and floats.
4838     (Doug Cutting)
4839
4840 18. Added class org.apache.lucene.index.IndexModifier which combines
4841     IndexWriter and IndexReader, so you can add and delete documents without
4842     worrying about synchronization/locking issues.
4843     (Daniel Naber)
4844
4845 19. Lucene can now be used inside an unsigned applet, as Lucene's access
4846     to system properties will not cause a SecurityException anymore.
4847     (Jon Schuster via Daniel Naber, bug #34359)
4848
4849 20. Added a new class MatchAllDocsQuery that matches all documents.
4850     (John Wang via Daniel Naber, bug #34946)
4851
4852 21. Added ability to omit norms on a per field basis to decrease
4853     index size and memory consumption when there are many indexed fields.
4854     See Field.setOmitNorms()
4855     (Yonik Seeley, LUCENE-448)
4856
4857 22. Added NullFragmenter to contrib/highlighter, which is useful for
4858     highlighting entire documents or fields.
4859     (Erik Hatcher)
4860
4861 23. Added regular expression queries, RegexQuery and SpanRegexQuery.
4862     Note the same term enumeration caveats apply with these queries as
4863     apply to WildcardQuery and other term expanding queries.
4864     These two new queries are not currently supported via QueryParser.
4865     (Erik Hatcher)
4866
4867 24. Added ConstantScoreQuery which wraps a filter and produces a score
4868     equal to the query boost for every matching document.
4869     (Yonik Seeley, LUCENE-383)
4870
4871 25. Added ConstantScoreRangeQuery which produces a constant score for
4872     every document in the range.  One advantage over a normal RangeQuery
4873     is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
4874     number of terms the range can cover.  Both endpoints may also be open.
4875     (Yonik Seeley, LUCENE-383)
4876
4877 26. Added ability to specify a minimum number of optional clauses that
4878     must match in a BooleanQuery.  See BooleanQuery.setMinimumNumberShouldMatch().
4879     (Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
4880
4881 27. Added DisjunctionMaxQuery which provides the maximum score across its clauses.
4882     It's very useful for searching across multiple fields.
4883     (Chuck Williams via Yonik Seeley, LUCENE-323)
4884
4885 28. New class ISOLatin1AccentFilter that replaces accented characters in the ISO
4886     Latin 1 character set by their unaccented equivalent.
4887     (Sven Duzont via Erik Hatcher)
4888
4889 29. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
4890     This is useful for data like zip codes, ids, and some product names.
4891     (Erik Hatcher)
4892
4893 30. Copied LengthFilter from contrib area to core. Removes words that are too
4894     long and too short from the stream.
4895     (David Spencer via Otis and Daniel)
4896
4897 31. Added getPositionIncrementGap(String fieldName) to Analyzer.  This allows
4898     custom analyzers to put gaps between Field instances with the same field
4899     name, preventing phrase or span queries crossing these boundaries.  The
4900     default implementation issues a gap of 0, allowing the default token
4901     position increment of 1 to put the next field's first token into a
4902     successive position.
4903     (Erik Hatcher, with advice from Yonik)
4904
4905 32. StopFilter can now ignore case when checking for stop words.
4906     (Grant Ingersoll via Yonik, LUCENE-248)
4907
4908 33. Add TopDocCollector and TopFieldDocCollector.  These simplify the
4909     implementation of hit collectors that collect only the
4910     top-scoring or top-sorting hits.
4911
4912 API Changes
4913
4914  1. Several methods and fields have been deprecated. The API documentation
4915     contains information about the recommended replacements. It is planned
4916     that most of the deprecated methods and fields will be removed in
4917     Lucene 2.0. (Daniel Naber)
4918
4919  2. The Russian and the German analyzers have been moved to contrib/analyzers.
4920     Also, the WordlistLoader class has been moved one level up in the
4921     hierarchy and is now org.apache.lucene.analysis.WordlistLoader
4922     (Daniel Naber)
4923
4924  3. The API contained methods that declared to throw an IOException
4925     but that never did this. These declarations have been removed. If
4926     your code tries to catch these exceptions you might need to remove
4927     those catch clauses to avoid compile errors. (Daniel Naber)
4928
4929  4. Add a serializable Parameter Class to standardize parameter enum
4930     classes in BooleanClause and Field. (Christoph)
4931
4932  5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
4933     This allows custom SpanQuery subclasses that rewrite (for term expansion, for
4934     example) to nest within the built-in SpanQuery classes successfully.
4935
4936 Bug fixes
4937
4938  1. The JSP demo page (src/jsp/results.jsp) now properly closes the
4939     IndexSearcher it opens. (Daniel Naber)
4940
4941  2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
4942     prevented deletion of obsolete segments. (Christoph Goller)
4943
4944  3. Fix in FieldInfos to avoid the return of an extra blank field in
4945     IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
4946
4947  4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
4948     PhrasePrefixQuery) could provoke UnsupportedOperationException
4949     (bug #33161). (Rhett Sutphin via Daniel Naber)
4950
4951  5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
4952     if skipTo() was called without prior call to next() fixed. (Christoph)
4953
4954  6. Disable Similiarty.coord() in the scoring of most automatically
4955     generated boolean queries.  The coord() score factor is
4956     appropriate when clauses are independently specified by a user,
4957     but is usually not appropriate when clauses are generated
4958     automatically, e.g., by a fuzzy, wildcard or range query.  Matches
4959     on such automatically generated queries are no longer penalized
4960     for not matching all terms.  (Doug Cutting, Patch #33472)
4961
4962  7. Getting a lock file with Lock.obtain(long) was supposed to wait for
4963     a given amount of milliseconds, but this didn't work.
4964     (John Wang via Daniel Naber, Bug #33799)
4965
4966  8. Fix FSDirectory.createOutput() to always create new files.
4967     Previously, existing files were overwritten, and an index could be
4968     corrupted when the old version of a file was longer than the new.
4969     Now any existing file is first removed.  (Doug Cutting)
4970
4971  9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
4972     could return an incorrect number of hits.
4973     (Reece Wilton via Erik Hatcher, Bug #35157)
4974
4975 10. Fix NullPointerException that could occur with a MultiPhraseQuery
4976     inside a BooleanQuery.
4977     (Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)
4978
4979 11. Fixed SnowballFilter to pass through the position increment from
4980     the original token.
4981     (Yonik Seeley via Erik Hatcher, LUCENE-437)
4982
4983 12. Added Unicode range of Korean characters to StandardTokenizer,
4984     grouping contiguous characters into a token rather than one token
4985     per character.  This change also changes the token type to "<CJ>"
4986     for Chinese and Japanese character tokens (previously it was "<CJK>").
4987     (Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
4988
4989 13. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
4990     FieldInfo.storePositionWithTermVector and creates the Field with
4991     correct TermVector parameter.
4992     (Frank Steinmann via Bernhard, LUCENE-455)
4993
4994 14. Fixed WildcardQuery to prevent "cat" matching "ca??".
4995     (Xiaozheng Ma via Bernhard, LUCENE-306)
4996
4997 15. Fixed a bug where MultiSearcher and ParallelMultiSearcher could
4998     change the sort order when sorting by string for documents without
4999     a value for the sort field.
5000     (Luc Vanlerberghe via Yonik, LUCENE-453)
5001
5002 16. Fixed a sorting problem with MultiSearchers that can lead to
5003     missing or duplicate docs due to equal docs sorting in an arbitrary order.
5004     (Yonik Seeley, LUCENE-456)
5005
5006 17. A single hit using the expert level sorted search methods
5007     resulted in the score not being normalized.
5008     (Yonik Seeley, LUCENE-462)
5009
5010 18. Fixed inefficient memory usage when loading an index into RAMDirectory.
5011     (Volodymyr Bychkoviak via Bernhard, LUCENE-475)
5012
5013 19. Corrected term offsets returned by ChineseTokenizer.
5014     (Ray Tsang via Erik Hatcher, LUCENE-324)
5015
5016 20. Fixed MultiReader.undeleteAll() to correctly update numDocs.
5017     (Robert Kirchgessner via Doug Cutting, LUCENE-479)
5018
5019 21. Race condition in IndexReader.getCurrentVersion() and isCurrent()
5020     fixed by acquiring the commit lock.
5021     (Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
5022
5023 22. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
5024     this has now been fixed. (Daniel Naber)
5025
5026 23. Fixed QueryParser when called with a date in local form like
5027     "[1/16/2000 TO 1/18/2000]". This query did not include the documents
5028     of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
5029
5030 24. Removed sorting constraint that threw an exception if there were
5031     not yet any values for the sort field (Yonik Seeley, LUCENE-374)
5032
5033 Optimizations
5034
5035  1. Disk usage (peak requirements during indexing and optimization)
5036     in case of compound file format has been improved.
5037     (Bernhard, Dmitry, and Christoph)
5038
5039  2. Optimize the performance of certain uses of BooleanScorer,
5040     TermScorer and IndexSearcher.  In particular, a BooleanQuery
5041     composed of TermQuery, with not all terms required, that returns a
5042     TopDocs (e.g., through a Hits with no Sort specified) runs much
5043     faster.  (cutting)
5044
5045  3. Removed synchronization from reading of term vectors with an
5046     IndexReader (Patch #30736). (Bernhard Messer via Christoph)
5047
5048  4. Optimize term-dictionary lookup to allocate far fewer terms when
5049     scanning for the matching term.  This speeds searches involving
5050     low-frequency terms, where the cost of dictionary lookup can be
5051     significant. (cutting)
5052
5053  5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
5054     of 0 now run 20-50% faster (Patch #31882).
5055     (Jonathan Hager via Daniel Naber)
5056
5057  6. A Version of BooleanScorer (BooleanScorer2) added that delivers
5058     documents in increasing order and implements skipTo. For queries
5059     with required or forbidden clauses it may be faster than the old
5060     BooleanScorer, for BooleanQueries consisting only of optional
5061     clauses it is probably slower. The new BooleanScorer is now the
5062     default. (Patch 31785 by Paul Elschot via Christoph)
5063
5064  7. Use uncached access to norms when merging to reduce RAM usage.
5065     (Bug #32847).  (Doug Cutting)
5066
5067  8. Don't read term index when random-access is not required.  This
5068     reduces time to open IndexReaders and they use less memory when
5069     random access is not required, e.g., when merging segments.  The
5070     term index is now read into memory lazily at the first
5071     random-access.  (Doug Cutting)
5072
5073  9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
5074     added indexes is larger than mergeFactor.  Previously this could
5075     result in quadratic performance.  Now performance is n log(n).
5076     (Doug Cutting)
5077
5078 10. Speed up the creation of TermEnum for indices with multiple
5079     segments and deleted documents, and thus speed up PrefixQuery,
5080     RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
5081     and sorting the first time on a field.
5082     (Yonik Seeley, LUCENE-454)
5083
5084 11. Optimized and generalized 32 bit floating point to byte
5085     (custom 8 bit floating point) conversions.  Increased the speed of
5086     Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
5087     (Yonik Seeley, LUCENE-467)
5088
5089 Infrastructure
5090
5091  1. Lucene's source code repository has converted from CVS to
5092     Subversion.  The new repository is at
5093     http://svn.apache.org/repos/asf/lucene/java/trunk
5094
5095  2. Lucene's issue tracker has migrated from Bugzilla to JIRA.
5096     Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
5097     The old issues are still available at
5098     http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx
5099     (use the bug number instead of xxxx)
5100
5101
5102 1.4.3
5103
5104  1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
5105     messages which might contain user input (e.g. error messages about
5106     query parsing). If you used that page as a starting point for your
5107     own code please make sure your code also properly escapes HTML
5108     characters from user input in order to avoid so-called cross site
5109     scripting attacks. (Daniel Naber)
5110
5111   2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
5112      API is supported again. (Christoph)
5113
5114
5115 1.4.2
5116
5117  1. Fixed bug #31241: Sorting could lead to incorrect results (documents
5118     missing, others duplicated) if the sort keys were not unique and there
5119     were more than 100 matches. (Daniel Naber)
5120
5121  2. Memory leak in Sort code (bug #31240) eliminated.
5122     (Rafal Krzewski via Christoph and Daniel)
5123
5124  3. FuzzyQuery now takes an additional parameter that specifies the
5125     minimum similarity that is required for a term to match the query.
5126     The QueryParser syntax for this is term~x, where x is a floating
5127     point number >= 0 and < 1 (a bigger number means that a higher
5128     similarity is required). Furthermore, a prefix can be specified
5129     for FuzzyQuerys so that only those terms are considered similar that
5130     start with this prefix. This can speed up FuzzyQuery greatly.
5131     (Daniel Naber, Christoph Goller)
5132
5133  4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
5134     of relative positions. (Christoph Goller)
5135
5136  5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
5137     (patch #9110); some unused method parameters removed; The ability
5138     to specify a minimum similarity for FuzzyQuery has been added.
5139     (Christoph Goller)
5140
5141  6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
5142     for every non-zero-scoring hit.  This makes 'OR' queries that
5143     contain common terms substantially faster.  (cutting)
5144
5145
5146 1.4.1
5147
5148  1. Fixed a performance bug in hit sorting code, where values were not
5149     correctly cached.  (Aviran via cutting)
5150
5151  2. Fixed errors in file format documentation. (Daniel Naber)
5152
5153
5154 1.4 final
5155
5156  1. Added "an" to the list of stop words in StopAnalyzer, to complement
5157     the existing "a" there.  Fix for bug 28960
5158      (http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
5159
5160  2. Added new class FieldCache to manage in-memory caches of field term
5161     values.  (Tim Jones)
5162
5163  3. Added overloaded getFieldQuery method to QueryParser which
5164     accepts the slop factor specified for the phrase (or the default
5165     phrase slop for the QueryParser instance).  This allows overriding
5166     methods to replace a PhraseQuery with a SpanNearQuery instead,
5167     keeping the proper slop factor. (Erik Hatcher)
5168
5169  4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
5170     UTF-8 and changed the build encoding to UTF-8, to make changed files
5171     compile. (Otis Gospodnetic)
5172
5173  5. Removed synchronization from term lookup under IndexReader methods
5174     termFreq(), termDocs() or termPositions() to improve
5175     multi-threaded performance.  (cutting)
5176
5177  6. Fix a bug where obsolete segment files were not deleted on Win32.
5178
5179
5180 1.4 RC3
5181
5182  1. Fixed several search bugs introduced by the skipTo() changes in
5183     release 1.4RC1.  The index file format was changed a bit, so
5184     collections must be re-indexed to take advantage of the skipTo()
5185     optimizations.  (Christoph Goller)
5186
5187  2. Added new Document methods, removeField() and removeFields().
5188     (Christoph Goller)
5189
5190  3. Fixed inconsistencies with index closing.  Indexes and directories
5191     are now only closed automatically by Lucene when Lucene opened
5192     them automatically.  (Christoph Goller)
5193
5194  4. Added new class: FilteredQuery.  (Tim Jones)
5195
5196  5. Added a new SortField type for custom comparators.  (Tim Jones)
5197
5198  6. Lock obtain timed out message now displays the full path to the lock
5199     file. (Daniel Naber via Erik)
5200
5201  7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
5202
5203  8. Fixed so that FSDirectory's locks still work when the
5204     java.io.tmpdir system property is null.  (cutting)
5205
5206  9. Changed FilteredTermEnum's constructor to take no parameters,
5207     as the parameters were ignored anyway (bug #28858)
5208
5209 1.4 RC2
5210
5211  1. GermanAnalyzer now throws an exception if the stopword file
5212     cannot be found (bug #27987). It now uses LowerCaseFilter
5213     (bug #18410) (Daniel Naber via Otis, Erik)
5214
5215  2. Fixed a few bugs in the file format documentation. (cutting)
5216
5217
5218 1.4 RC1
5219
5220  1. Changed the format of the .tis file, so that:
5221
5222     - it has a format version number, which makes it easier to
5223       back-compatibly change file formats in the future.
5224
5225     - the term count is now stored as a long.  This was the one aspect
5226       of the Lucene's file formats which limited index size.
5227
5228     - a few internal index parameters are now stored in the index, so
5229       that they can (in theory) now be changed from index to index,
5230       although there is not yet an API to do so.
5231
5232     These changes are back compatible.  The new code can read old
5233     indexes.  But old code will not be able read new indexes. (cutting)
5234
5235  2. Added an optimized implementation of TermDocs.skipTo().  A skip
5236     table is now stored for each term in the .frq file.  This only
5237     adds a percent or two to overall index size, but can substantially
5238     speedup many searches.  (cutting)
5239
5240  3. Restructured the Scorer API and all Scorer implementations to take
5241     advantage of an optimized TermDocs.skipTo() implementation.  In
5242     particular, PhraseQuerys and conjunctive BooleanQuerys are
5243     faster when one clause has substantially fewer matches than the
5244     others.  (A conjunctive BooleanQuery is a BooleanQuery where all
5245     clauses are required.)  (cutting)
5246
5247  4. Added new class ParallelMultiSearcher.  Combined with
5248     RemoteSearchable this makes it easy to implement distributed
5249     search systems.  (Jean-Francois Halleux via cutting)
5250
5251  5. Added support for hit sorting.  Results may now be sorted by any
5252     indexed field.  For details see the javadoc for
5253     Searcher#search(Query, Sort).  (Tim Jones via Cutting)
5254
5255  6. Changed FSDirectory to auto-create a full directory tree that it
5256     needs by using mkdirs() instead of mkdir().  (Mladen Turk via Otis)
5257
5258  7. Added a new span-based query API.  This implements, among other
5259     things, nested phrases.  See javadocs for details.  (Doug Cutting)
5260
5261  8. Added new method Query.getSimilarity(Searcher), and changed
5262     scorers to use it.  This permits one to subclass a Query class so
5263     that it can specify its own Similarity implementation, perhaps
5264     one that delegates through that of the Searcher.  (Julien Nioche
5265     via Cutting)
5266
5267  9. Added MultiReader, an IndexReader that combines multiple other
5268     IndexReaders.  (Cutting)
5269
5270 10. Added support for term vectors.  See Field#isTermVectorStored().
5271     (Grant Ingersoll, Cutting & Dmitry)
5272
5273 11. Fixed the old bug with escaping of special characters in query
5274     strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
5275     (Jean-Francois Halleux via Otis)
5276
5277 12. Added support for overriding default values for the following,
5278     using system properties:
5279       - default commit lock timeout
5280       - default maxFieldLength
5281       - default maxMergeDocs
5282       - default mergeFactor
5283       - default minMergeDocs
5284       - default write lock timeout
5285     (Otis)
5286
5287 13. Changed QueryParser.jj to allow '-' and '+' within tokens:
5288     http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
5289     (Morus Walter via Otis)
5290
5291 14. Changed so that the compound index format is used by default.
5292     This makes indexing a bit slower, but vastly reduces the chances
5293     of file handle problems.  (Cutting)
5294
5295
5296 1.3 final
5297
5298  1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
5299     throw ParseException instead. (Erik Hatcher)
5300
5301  2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
5302
5303  3. Added a new method IndexReader.setNorm(), that permits one to
5304     alter the boosting of fields after an index is created.
5305
5306  4. Distinguish between the final position and length when indexing a
5307     field.  The length is now defined as the total number of tokens,
5308     instead of the final position, as it was previously.  Length is
5309     used for score normalization (Similarity.lengthNorm()) and for
5310     controlling memory usage (IndexWriter.maxFieldLength).  In both of
5311     these cases, the total number of tokens is a better value to use
5312     than the final token position.  Position is used in phrase
5313     searching (see PhraseQuery and Token.setPositionIncrement()).
5314
5315  5. Fix StandardTokenizer's handling of CJK characters (Chinese,
5316     Japanese and Korean ideograms).  Previously contiguous sequences
5317     were combined in a single token, which is not very useful.  Now
5318     each ideogram generates a separate token, which is more useful.
5319
5320
5321 1.3 RC3
5322
5323  1. Added minMergeDocs in IndexWriter.  This can be raised to speed
5324     indexing without altering the number of files, but only using more
5325     memory.  (Julien Nioche via Otis)
5326
5327  2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
5328
5329  3. Fix bug #16952, in demo HTML parser, skip comments in
5330     javascript. (Christoph Goller)
5331
5332  4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
5333     output (Daniel Naber via Christoph Goller)
5334
5335  5. Fix bug #24301, in demo HTML parser, long titles no longer
5336     hang things. (Christoph Goller)
5337
5338  6. Fix bug #23534, Replace use of file timestamp of segments file
5339     with an index version number stored in the segments file.  This
5340     resolves problems when running on file systems with low-resolution
5341     timestamps, e.g., HFS under MacOS X.  (Christoph Goller)
5342
5343  7. Fix QueryParser so that TokenMgrError is not thrown, only
5344     ParseException.  (Erik Hatcher)
5345
5346  8. Fix some bugs introduced by change 11 of RC2.  (Christoph Goller)
5347
5348  9. Fixed a problem compiling TestRussianStem.  (Christoph Goller)
5349
5350 10. Cleaned up some build stuff.  (Erik Hatcher)
5351
5352
5353 1.3 RC2
5354
5355  1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
5356     SegmentsReader. (Julien Nioche via otis)
5357
5358  2. Changed file locking to place lock files in
5359     System.getProperty("java.io.tmpdir"), where all users are
5360     permitted to write files.  This way folks can open and correctly
5361     lock indexes which are read-only to them.
5362
5363  3. IndexWriter: added a new method, addDocument(Document, Analyzer),
5364     permitting one to easily use different analyzers for different
5365     documents in the same index.
5366
5367  4. Minor enhancements to FuzzyTermEnum.
5368     (Christoph Goller via Otis)
5369
5370  5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
5371     and MultiIndexSearcher to use it.
5372     (Christoph Goller via Otis)
5373
5374  6. Fixed a bug in IndexWriter that returned incorrect docCount().
5375     (Christoph Goller via Otis)
5376
5377  7. Fixed SegmentsReader to eliminate the confusing and slightly different
5378     behaviour of TermEnum when dealing with an enumeration of all terms,
5379     versus an enumeration starting from a specific term.
5380     This patch also fixes incorrect term document frequencies when the same term
5381     is present in multiple segments.
5382     (Christoph Goller via Otis)
5383
5384  8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
5385
5386  9. Added support for the new "compound file" index format (Dmitry
5387     Serebrennikov)
5388
5389 10. Added Locale setting to QueryParser, for use by date range parsing.
5390
5391 11. Changed IndexReader so that it can be subclassed by classes
5392     outside of its package.  Previously it had package-private
5393     abstract methods.  Also modified the index merging code so that it
5394     can work on an arbitrary IndexReader implementation, and added a
5395     new method, IndexWriter.addIndexes(IndexReader[]), to take
5396     advantage of this. (cutting)
5397
5398 12. Added a limit to the number of clauses which may be added to a
5399     BooleanQuery.  The default limit is 1024 clauses.  This should
5400     stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
5401     queries which run amok. (cutting)
5402
5403 13. Add new method: IndexReader.undeleteAll().  This undeletes all
5404     deleted documents which still remain in the index. (cutting)
5405
5406
5407 1.3 RC1
5408
5409  1. Fixed PriorityQueue's clear() method.
5410     Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
5411     (Matthijs Bomhoff via otis)
5412
5413  2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
5414     Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
5415     (Dale Anson via otis)
5416
5417  3. Added the ability to disable lock creation by using disableLuceneLocks
5418     system property.  This is useful for read-only media, such as CD-ROMs.
5419     (otis)
5420
5421  4. Added id method to Hits to be able to access the index global id.
5422     Required for sorting options.
5423     (carlson)
5424
5425  5. Added support for new range query syntax to QueryParser.jj.
5426     (briangoetz)
5427
5428  6. Added the ability to retrieve HTML documents' META tag values to
5429     HTMLParser.jj.
5430     (Mark Harwood via otis)
5431
5432  7. Modified QueryParser to make it possible to programmatically specify the
5433     default Boolean operator (OR or AND).
5434     (Péter Halácsy via otis)
5435
5436  8. Made many search methods and classes non-final, per requests.
5437     This includes IndexWriter and IndexSearcher, among others.
5438     (cutting)
5439
5440  9. Added class RemoteSearchable, providing support for remote
5441     searching via RMI.  The test class RemoteSearchableTest.java
5442     provides an example of how this can be used.  (cutting)
5443
5444  10. Added PhrasePrefixQuery (and supporting MultipleTermPositions).  The
5445      test class TestPhrasePrefixQuery provides the usage example.
5446      (Anders Nielsen via otis)
5447
5448  11. Changed the German stemming algorithm to ignore case while
5449      stripping. The new algorithm is faster and produces more equal
5450      stems from nouns and verbs derived from the same word.
5451      (gschwarz)
5452
5453  12. Added support for boosting the score of documents and fields via
5454      the new methods Document.setBoost(float) and Field.setBoost(float).
5455
5456      Note: This changes the encoding of an indexed value.  Indexes
5457      should be re-created from scratch in order for search scores to
5458      be correct.  With the new code and an old index, searches will
5459      yield very large scores for shorter fields, and very small scores
5460      for longer fields.  Once the index is re-created, scores will be
5461      as before. (cutting)
5462
5463  13. Added new method Token.setPositionIncrement().
5464
5465      This permits, for the purpose of phrase searching, placing
5466      multiple terms in a single position.  This is useful with
5467      stemmers that produce multiple possible stems for a word.
5468
5469      This also permits the introduction of gaps between terms, so that
5470      terms which are adjacent in a token stream will not be matched by
5471      and exact phrase query.  This makes it possible, e.g., to build
5472      an analyzer where phrases are not matched over stop words which
5473      have been removed.
5474
5475      Finally, repeating a token with an increment of zero can also be
5476      used to boost scores of matches on that token.  (cutting)
5477
5478  14. Added new Filter class, QueryFilter.  This constrains search
5479      results to only match those which also match a provided query.
5480      Results are cached, so that searches after the first on the same
5481      index using this filter are very fast.
5482
5483      This could be used, for example, with a RangeQuery on a formatted
5484      date field to implement date filtering.  One could re-use a
5485      single QueryFilter that matches, e.g., only documents modified
5486      within the last week.  The QueryFilter and RangeQuery would only
5487      need to be reconstructed once per day. (cutting)
5488
5489  15. Added a new IndexWriter method, getAnalyzer().  This returns the
5490      analyzer used when adding documents to this index. (cutting)
5491
5492  16. Fixed a bug with IndexReader.lastModified().  Before, document
5493      deletion did not update this.  Now it does.  (cutting)
5494
5495  17. Added Russian Analyzer.
5496      (Boris Okner via otis)
5497
5498  18. Added a public, extensible scoring API.  For details, see the
5499      javadoc for org.apache.lucene.search.Similarity.
5500
5501  19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
5502
5503  20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
5504      (Peter Mularien via otis)
5505
5506  21. Added getFields(String) and getValues(String) methods.
5507      Contributed by Rasik Pandey on 2002-10-09
5508      (Rasik Pandey via otis)
5509
5510  22. Revised internal search APIs.  Changes include:
5511
5512        a. Queries are no longer modified during a search.  This makes
5513        it possible, e.g., to reuse the same query instance with
5514        multiple indexes from multiple threads.
5515
5516        b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
5517        etc.)  now work correctly with MultiSearcher, fixing bugs 12619
5518        and 12667.
5519
5520        c. Boosting BooleanQuery's now works, and is supported by the
5521        query parser (problem reported by Lee Mallabone).  Thus a query
5522        like "(+foo +bar)^2 +baz" is now supported and equivalent to
5523        "(+foo^2 +bar^2) +baz".
5524
5525        d. New method: Query.rewrite(IndexReader).  This permits a
5526        query to re-write itself as an alternate, more primitive query.
5527        Most of the term-expanding query classes (PrefixQuery,
5528        WildcardQuery, etc.) are now implemented using this method.
5529
5530        e. New method: Searchable.explain(Query q, int doc).  This
5531        returns an Explanation instance that describes how a particular
5532        document is scored against a query.  An explanation can be
5533        displayed as either plain text, with the toString() method, or
5534        as HTML, with the toHtml() method.  Note that computing an
5535        explanation is as expensive as executing the query over the
5536        entire index.  This is intended to be used in developing
5537        Similarity implementations, and, for good performance, should
5538        not be displayed with every hit.
5539
5540        f. Scorer and Weight are public, not package protected.  It now
5541        possible for someone to write a Scorer implementation that is
5542        not in the org.apache.lucene.search package.  This is still
5543        fairly advanced programming, and I don't expect anyone to do
5544        this anytime soon, but at least now it is possible.
5545
5546        g. Added public accessors to the primitive query classes
5547        (TermQuery, PhraseQuery and BooleanQuery), permitting access to
5548        their terms and clauses.
5549
5550      Caution: These are extensive changes and they have not yet been
5551      tested extensively.  Bug reports are appreciated.
5552      (cutting)
5553
5554  23. Added convenience RAMDirectory constructors taking File and String
5555      arguments, for easy FSDirectory to RAMDirectory conversion.
5556      (otis)
5557
5558  24. Added code for manual renaming of files in FSDirectory, since it
5559      has been reported that java.io.File's renameTo(File) method sometimes
5560      fails on Windows JVMs.
5561      (Matt Tucker via otis)
5562
5563  25. Refactored QueryParser to make it easier for people to extend it.
5564      Added the ability to automatically lower-case Wildcard terms in
5565      the QueryParser.
5566      (Tatu Saloranta via otis)
5567
5568
5569 1.2 RC6
5570
5571  1. Changed QueryParser.jj to have "?" be a special character which
5572     allowed it to be used as a wildcard term. Updated TestWildcard
5573     unit test also. (Ralf Hettesheimer via carlson)
5574
5575 1.2 RC5
5576
5577  1. Renamed build.properties to default.properties and updated
5578     the BUILD.txt document to describe how to override the
5579     default.property settings without having to edit the file. This
5580     brings the build process closer to Scarab's build process.
5581     (jon)
5582
5583  2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
5584
5585  3. Updated "powered by" links. (otis)
5586
5587  4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
5588
5589  5. Added throwing exception if FSDirectory could not create directory
5590     - Bug #6914 (Eugene Gluzberg via otis)
5591
5592  6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
5593     LowerCaseTokenizer javadoc (otis)
5594
5595  7. Added fix to avoid NullPointerException in results.jsp
5596     (Mark Hayes via otis)
5597
5598  8. Changed Wildcard search to find 0 or more char instead of 1 or more
5599     (Lee Mallobone, via otis)
5600
5601  9. Fixed error in offset issue in GermanStemFilter - Bug #7412
5602     (Rodrigo Reyes, via otis)
5603
5604  10. Added unit tests for wildcard search and DateFilter (otis)
5605
5606  11. Allow co-existence of indexed and non-indexed fields with the same name
5607      (cutting/casper, via otis)
5608
5609  12. Add escape character to query parser.
5610      (briangoetz)
5611
5612  13. Applied a patch that ensures that searches that use DateFilter
5613      don't throw an exception when no matches are found. (David Smiley, via
5614      otis)
5615
5616  14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
5617
5618
5619 1.2 RC4
5620
5621  1. Updated contributions section of website.
5622     Add XML Document #3 implementation to Document Section.
5623     Also added Term Highlighting to Misc Section. (carlson)
5624
5625  2. Fixed NullPointerException for phrase searches containing
5626     unindexed terms, introduced in 1.2RC3.  (cutting)
5627
5628  3. Changed document deletion code to obtain the index write lock,
5629     enforcing the fact that document addition and deletion cannot be
5630     performed concurrently.  (cutting)
5631
5632  4. Various documentation cleanups.  (otis, acoliver)
5633
5634  5. Updated "powered by" links.  (cutting, jon)
5635
5636  6. Fixed a bug in the GermanStemmer.  (Bernhard Messer, via otis)
5637
5638  7. Changed Term and Query to implement Serializable.  (scottganyo)
5639
5640  8. Fixed to never delete indexes added with IndexWriter.addIndexes().
5641     (cutting)
5642
5643  9. Upgraded to JUnit 3.7. (otis)
5644
5645 1.2 RC3
5646
5647  1. IndexWriter: fixed a bug where adding an optimized index to an
5648     empty index failed.  This was encountered using addIndexes to copy
5649     a RAMDirectory index to an FSDirectory.
5650
5651  2. RAMDirectory: fixed a bug where RAMInputStream could not read
5652     across more than across a single buffer boundary.
5653
5654  3. Fix query parser so it accepts queries with unicode characters.
5655     (briangoetz)
5656
5657  4. Fix query parser so that PrefixQuery is used in preference to
5658     WildcardQuery when there's only an asterisk at the end of the
5659     term.  Previously PrefixQuery would never be used.
5660
5661  5. Fix tests so they compile; fix ant file so it compiles tests
5662     properly.  Added test cases for Analyzers and PriorityQueue.
5663
5664  6. Updated demos, added Getting Started documentation. (acoliver)
5665
5666  7. Added 'contributions' section to website & docs. (carlson)
5667
5668  8. Removed JavaCC from source distribution for copyright reasons.
5669     Folks must now download this separately from metamata in order to
5670     compile Lucene.  (cutting)
5671
5672  9. Substantially improved the performance of DateFilter by adding the
5673     ability to reuse TermDocs objects.  (cutting)
5674
5675 10. Added IndexReader methods:
5676       public static boolean indexExists(String directory);
5677       public static boolean indexExists(File directory);
5678       public static boolean indexExists(Directory directory);
5679       public static boolean isLocked(Directory directory);
5680       public static void unlock(Directory directory);
5681     (cutting, otis)
5682
5683 11. Fixed bugs in GermanAnalyzer (gschwarz)
5684
5685
5686 1.2 RC2:
5687  - added sources to distribution
5688  - removed broken build scripts and libraries from distribution
5689  - SegmentsReader: fixed potential race condition
5690  - FSDirectory: fixed so that getDirectory(xxx,true) correctly
5691    erases the directory contents, even when the directory
5692    has already been accessed in this JVM.
5693  - RangeQuery: Fix issue where an inclusive range query would
5694    include the nearest term in the index above a non-existant
5695    specified upper term.
5696  - SegmentTermEnum: Fix NullPointerException in clone() method
5697    when the Term is null.
5698  - JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
5699    since they rely on a feature added in JDK 1.2.
5700
5701 1.2 RC1 (first Apache release):
5702   - packages renamed from com.lucene to org.apache.lucene
5703   - license switched from LGPL to Apache
5704   - ant-only build -- no more makefiles
5705   - addition of lock files--now fully thread & process safe
5706   - addition of German stemmer
5707   - MultiSearcher now supports low-level search API
5708   - added RangeQuery, for term-range searching
5709   - Analyzers can choose tokenizer based on field name
5710   - misc bug fixes.
5711
5712 1.01b (last Sourceforge release)
5713  . a few bug fixes
5714  . new Query Parser
5715  . new prefix query (search for "foo*" matches "food")
5716
5717 1.0
5718
5719 This release fixes a few serious bugs and also includes some
5720 performance optimizations, a stemmer, and a few other minor
5721 enhancements.
5722
5723 0.04
5724
5725 Lucene now includes a grammar-based tokenizer, StandardTokenizer.
5726
5727 The only tokenizer included in the previous release (LetterTokenizer)
5728 identified terms consisting entirely of alphabetic characters.  The
5729 new tokenizer uses a regular-expression grammar to identify more
5730 complex classes of terms, including numbers, acronyms, email
5731 addresses, etc.
5732
5733 StandardTokenizer serves two purposes:
5734
5735  1. It is a much better, general purpose tokenizer for use by
5736     applications as is.
5737
5738     The easiest way for applications to start using
5739     StandardTokenizer is to use StandardAnalyzer.
5740
5741  2. It provides a good example of grammar-based tokenization.
5742
5743     If an application has special tokenization requirements, it can
5744     implement a custom tokenizer by copying the directory containing
5745     the new tokenizer into the application and modifying it
5746     accordingly.
5747
5748 0.01
5749
5750 First open source release.
5751
5752 The code has been re-organized into a new package and directory
5753 structure for this release.  It builds OK, but has not been tested
5754 beyond that since the re-organization.