lucene-java-3.5.0/lucene/contrib/CHANGES.txt

   1 Lucene contrib change Log
   2
   3 For more information on past and future Lucene versions, please see:
   4 http://s.apache.org/luceneversions
   5
   6 ======================= Lucene 3.5.0 ================
   7
   8 Changes in backwards compatibility policy
   9
  10  * LUCENE-3446: Removed BooleanFilter.finalResult() due to change to
  11    FixedBitSet.  (Uwe Schindler)
  12
  13  * LUCENE-3508: Changed some method signatures in decompounding TokenFilters
  14    to make them no longer use the Token class.  (Uwe Schindler)
  15
  16  * LUCENE-3557: The various SpellChecker.indexDictionary methods were removed,
  17    and consolidated to one:
  18
  19    indexDictionary(Dictionary dict, IndexWriterConfig config, boolean optimize)
  20
  21    Previously, there was no way to specify an IndexWriterConfig, and some
  22    of these methods would sneakily pass 'true' to optimize.  (Robert Muir)
  23
  24  * LUCENE-3558: Moved NRTManager & NRTManagerReopenThread into lucene core
  25    o.a.l.search. (Simon Willnauer)
  26
  27  * LUCENE-2564: WordListLoader is now flaged as @lucene.internal. All methods in
  28    WordListLoader now return CharArraySet/Map and expect Reader instances for
  29    efficiency. Utilities to open Readers from Files, InputStreams or Java
  30    resources were added to IOUtils. (Simon Willnauer, Robert Muir)
  31
  32  * LUCENE-3552: Renamed LuceneTaxonomyReader/Writer to DirectoryTR/TW. (Shai Erera)
  33
  34  * LUCENE-3556: DirectoryTaxonomyWriter's indexWriter is now private and
  35    openIndexWriter() now returns an IndexWriter. (Shai Erera)
  36
  37 New Features
  38
  39  * LUCENE-1824: Add BoundaryScanner interface and its implementation classes,
  40    SimpleBoundaryScanner and BreakIteratorBoundaryScanner, so that FVH's FragmentsBuilder
  41    can find "natural" boundary to make snippets. (Robert Muir, Koji Sekiguchi)
  42
  43  * LUCENE-1889: Add MultiTermQuery support for FVH. (Mike Sokolov via Koji Sekiguchi)
  44
  45  * LUCENE-3458: Change BooleanFilter to have only a single clauses ArrayList
  46    (so toString() works in order). It now behaves more like BooleanQuery,
  47    implements Iterable<FilterClause>, and allows adding Filters without
  48    creating FilterClause.  (Uwe Schindler)
  49
  50  * LUCENE-3414: Added HunspellStemFilter which uses a provided pure Java implementation of the
  51    Hunspell algorithm. (Chris Male)
  52
  53  * LUCENE-3445: Added SearcherManager, to manage sharing and reopening
  54    IndexSearchers across multiple search threads.  IndexReader's
  55    refCount is used to safely close the reader only once all threads are done
  56    using it.  (Michael McCandless)
  57
  58  * LUCENE-3486: Add SearcherLifetimeManager, to manage retrieving the
  59    same searcher used in a previous search to ensure follow-on actions
  60    (next page, drill down, etc.) use the same searcher as before (Mike
  61    McCandless)
  62
  63 API Changes
  64
  65  * LUCENE-3431: Deprecated QueryAutoStopWordAnalyzer.addStopWords* since they
  66    prevent reuse.  Stopwords are now to be computed when the Analyzer is instantiated.
  67    If new stopwords are needed, a new Analyzer instance should be created. (Chris Male)
  68
  69  * LUCENE-3434: Deprecated ShingleAnalyzerWrapper.set* since they prevent reuse.  The
  70    Analyzer should be configured at instantiation.  Deprecated PerFieldAnalyzerWrapper.addAnalyzer
  71    since it also prevents reuse.  Analyzers per field should be configured at instantiation.
  72    (Chris Male)
  73
  74  * LUCENE-3579: DirectoryTaxonomyWriter throws AlreadyClosedException if it was
  75    closed, but any of its API methods are called. (Shai Erera)
  76
  77  * LUCENE-3573: TaxonomyReader.refresh() signature was modified from void to
  78    boolean, now returning an indication if any change was detected. It
  79    throws a new InconsistentTaxonomyException if the taxonomy was recreated
  80    since TaxonomyReader was last opened or refreshed. (Doron Cohen)
  81
  82 Bug Fixes
  83
  84  * LUCENE-3417: DictionaryCompoundWordFilter did not properly add tokens from the
  85    end compound word. (Njal Karevoll via Robert Muir)
  86
  87  * LUCENE-3019: Fix unexpected color tags for FastVectorHighlighter. (Koji Sekiguchi)
  88
  89  * LUCENE-3446: Fix NPE in BooleanFilter when DocIdSet/DocIdSetIterator is null.
  90    Converted code to FixedBitSet and simplified.  (Uwe Schindler, Shuji Umino)
  91
  92  * LUCENE-3484: Fix NPE in TaxonomyWriter: parents array creation was not thread safe.
  93    (Doron Cohen)
  94
  95  * LUCENE-3485: Fix a bug in LuceneTaxonomyReader, where calling decRef() might
  96    close the inner IndexReader, leaving the taxonomy reader in limbo.
  97    (Gilad Barkai via Shai Erera)
  98
  99  * LUCENE-3495: Fix BlockJoinQuery to properly implement getBoost()/setBoost().
 100    (Robert Muir)
 101
 102  * LUCENE-3519: BlockJoinCollector always returned null when you tried
 103    to retrieve top groups for any BlockJoinQuery after the first (Mark
 104    Harwood, Mike McCandless)
 105
 106  * LUCENE-3301: Added a workaround for buggy BreakIterator implementations in
 107    Java that crash on certain inputs containing supplementary characters.
 108    (Robert Muir)
 109
 110  * LUCENE-3501: RandomSample was not random.
 111    Replaced with RandomSampler. For previous behavior use RepeatableSampler.
 112    (Gilad Barkai, Shai Erera, Doron Cohen)
 113
 114  * LUCENE-3508: Decompounders based on CompoundWordTokenFilterBase can now be
 115    used with custom attributes. All those attributes are preserved and set on all
 116    added decompounded tokens.  (Spyros Kapnissis, Uwe Schindler)
 117
 118  * LUCENE-3542: Group expanded query terms to preserve parent boolean operator
 119    in StandartQueryParser. (Simon Willnauer)
 120
 121  * LUCENE-3573: TaxonomyReader.refresh() was broken in case that the taxonomy was
 122    recreated since the taxonomy reader was last refreshed or opened. TR.refresh()
 123    now detects this situation and throws an InconsistentTaxonomyException.
 124    When obtaining such an exception the application should open a new taxonomy
 125    reader. Old taxonomy reader should be closed, once not more used.  (Doron Cohen)
 126
 127 API Changes
 128
 129  * LUCENE-3436: Add SuggestMode to the spellchecker, so you can specify the strategy
 130    for suggesting related terms.  (James Dyer via Robert Muir)
 131
 132  * LUCENE-3513: Add SimpleFragListBuilder constructor with margin parameter.
 133    (Kelsey Francis via Koji Sekiguchi)
 134
 135 ======================= Lucene 3.4.0 ================
 136
 137 New Features
 138
 139  * LUCENE-3234: provide a limit on phrase analysis in FastVectorHighlighter for
 140    highlighting speed up. Use FastVectorHighlighter.setPhraseLimit() to set limit
 141    (e.g. 5000). (Mike Sokolov via Koji Sekiguchi)
 142
 143  * LUCENE-3079: a new facet module which provides faceted indexing & search
 144    capabilities. It allows managing a taxonomy of categories, and index them
 145    with documents. It also provides search API for aggregating (e.g. count)
 146    the weights of the categories that are relevant to the search results.
 147    (Shai Erera)
 148
 149  * LUCENE-3171: Added BlockJoinQuery and BlockJoinCollector, under the
 150    new contrib/join module, to enable searches that require joining
 151    between parent and child documents.  Joined (children + parent)
 152    documents must be indexed as a document block, using
 153    IndexWriter.add/UpdateDocuments (Mark Harwood, Mike McCandless)
 154
 155  * LUCENE-3233, LUCENE-3375: Added SynonymFilter for applying multi-word synonyms
 156    during indexing or querying (with parsers for wordnet and solr formats).
 157    Removed contrib/wordnet.  (Simon Rosenthal, Robert Muir, Mike McCandless)
 158
 159  * LUCENE-1768: added support for numeric ranges in contrib query parser;
 160    added support for simple numeric queries, such as <age:4>, in contrib
 161    query parser (Vinicius Barros via Uwe Schindler)
 162
 163 Changes in runtime behavior:
 164
 165  * LUCENE-1768: StandardQueryConfigHandler now uses NumericFieldConfigListener
 166    to set a NumericConfig to its corresponding FieldConfig;
 167    StandardQueryTreeBuilder now uses DummyQueryNodeBuilder for
 168    NumericQueryNodes and uses NumericRangeQueryNodeBuilder for
 169    NumericRangeQueryNodes; StandardQueryNodeProcessorPipeline now executes
 170    NumericQueryNodeProcessor followed by NumericRangeQueryNodeProcessor
 171    right after LowercaseExpandedTermsQueryNodeProcessor
 172    (Vinicius Barros via Uwe Schindler)
 173
 174 API Changes
 175
 176  * LUCENE-3296: PKIndexSplitter & MultiPassIndexSplitter now have version
 177    constructors. PKIndexSplitter accepts a IndexWriterConfig for each of
 178    the target indexes. (Simon Willnauer, Jason Rutherglen)
 179
 180  * LUCENE-2979: queryparser configuration API located under
 181    org.apache.lucene.queryParser.core.config has been simplified and
 182    Attribute objects no longer should be used to configure query parsers. Now
 183    any configuration should be done through AbstractQueryConfig's set and get
 184    methods. The old API, which uses Attributes objects, is still in place, however
 185    it has been deprecated and will be removed soon.
 186    (Phillipe Ramalho via Adriano Crestani)
 187
 188  * LUCENE-3400: Deprecated DutchAnalyzer.setStemDictionary since it prevents
 189    TokenStream reuse (Chris Male)
 190
 191  * LUCENE-1768: setNumericConfigMap and getNumericConfigMap were added
 192    to StandardQueryParser; ParametricRangeQueryNode and
 193    oal.queryParser.standard.nodes.RangeQueryNode now implement
 194    oal.queryParser.core.nodes.RangeQueryNode;
 195    oal.queryParser.core.nodes.RangeQueryNode was deprecated and now extends
 196    TermRangeQueryNode, which extends AbstractRangeQueryNode;
 197    ParametricQueryNode was deprecated; FieldQueryNode now implements the
 198    new FieldValueQueryNode<CharSequence>, which this last one implements
 199    FieldableQueryNode and thew new ValueQueryNode
 200    (Vinicius Barros via Uwe Schindler)
 201
 202  * LUCENE-3488: Factored out SearcherManager from NRTManager. NRTManager
 203    now manages SearcherManager instances instead of IndexSearcher directly.
 204    Acquiring a SearcherManager is non-blocking unless the caller explicitly
 205    requires to acquire a certain SearcherManager generation. (Simon Willnauer)
 206
 207 Optimizations
 208
 209  * LUCENE-3306: Disabled indexing of positions for spellchecker n-gram
 210    fields: they are not needed because the spellchecker does not
 211    use positional queries.  (Robert Muir)
 212
 213 Bug Fixes
 214
 215  * LUCENE-3326: Fixed bug if you used MoreLikeThis.like(Reader), it would
 216    try to re-analyze the same Reader multiple times, passing different
 217    field names to the analyzer. Additionally MoreLikeThisQuery would take
 218    your string and encode/decode it using the default charset, it now uses
 219    a StringReader.  Finally, MoreLikeThis's methods that take File, URL, InputStream,
 220    are deprecated, please create the Reader yourself. (Trejkaz, Robert Muir)
 221
 222  * LUCENE-3347: XML query parser did not always incorporate boosts from
 223    UserQuery elements.  (Moogie, Uwe Schindler)
 224
 225  * LUCENE-3382: Fixed a bug where NRTCachingDirectory's listAll() would wrongly
 226    throw NoSuchDirectoryException when all files written so far have been
 227    cached to RAM and the directory still has not yet been created on the
 228    filesystem.  (Robert Muir)
 229
 230 ======================= Lucene 3.3.0 =======================
 231
 232 New Features
 233
 234  * LUCENE-152: Add KStem (light stemmer for English).
 235    (Yonik Seeley via Robert Muir)
 236
 237  * LUCENE-3135: Add suggesters (autocomplete) to contrib/spellchecker,
 238    with three implementations: Jaspell, Ternary Trie, and Finite State.
 239    (Andrzej Bialecki, Dawid Weiss, Mike Mccandless, Robert Muir)
 240
 241  * LUCENE-3129: Added BlockGroupingCollector, a single pass
 242    grouping collector which is faster than the two-pass approach, and
 243    also computes the total group count, but requires that every
 244    document sharing the same group was indexed as a doc block
 245    (IndexWriter.add/updateDocuments).  (Mike McCandless)
 246
 247  * LUCENE-2955: Added NRTManager and NRTManagerReopenThread, to
 248    simplify handling NRT reopen with multiple search threads, and to
 249    allow an app to control which indexing changes must be visible to
 250    which search requests.  (Mike McCandless)
 251
 252  * LUCENE-3191: Added SearchGroup.merge and TopGroups.merge, to
 253    facilitate doing grouping in a distributed environment (Uwe
 254    Schindler, Mike McCandless)
 255
 256  * LUCENE-2919: Added PKIndexSplitter, that splits an index according
 257    to a middle term in a specified field.  (Jason Rutherglen via Mike
 258    McCandless, Uwe Schindler)
 259
 260 API Changes
 261
 262  * LUCENE-3141: add getter method to access fragInfos in FieldFragList.
 263    (Sujit Pal via Koji Sekiguchi)
 264
 265  * LUCENE-3099: Allow subclasses to determine the group value for
 266    First/SecondPassGroupingCollector.  (Martijn van Groningen, Mike
 267    McCandless)
 268
 269 Bug Fixes
 270
 271  * LUCENE-3185: Fix bug in NRTCachingDirectory.deleteFile that would
 272    always throw exception and sometimes fail to actually delete the
 273    file.  (Mike McCandless)
 274
 275  * LUCENE-3188: contrib/misc IndexSplitter creates indexes with incorrect
 276    SegmentInfos.counter; added CheckIndex check & fix for this problem.
 277    (Ivan Dimitrov Vasilev via Steve Rowe)
 278
 279 Build
 280
 281  * LUCENE-3149: Upgrade contrib/icu's ICU jar file to ICU 4.8.
 282    (Robert Muir)
 283
 284 ======================= Lucene 3.2.0 =======================
 285
 286 Changes in backwards compatibility policy
 287
 288  * LUCENE-2981: Removed the following contribs: ant, db, lucli, swing. (Robert Muir)
 289
 290 Changes in runtime behavior
 291
 292  * LUCENE-3086: ItalianAnalyzer now uses ElisionFilter with a set of Italian
 293    contractions by default.  (Robert Muir)
 294
 295 Bug Fixes
 296
 297  * LUCENE-3045: fixed QueryNodeImpl.containsTag(String key) that was
 298    not lowercasing the key before checking for the tag (Adriano Crestani)
 299
 300  * LUCENE-3026: SmartChineseAnalyzer's WordTokenFilter threw NullPointerException
 301    on sentences longer than 32,767 characters.  (wangzhenghang via Robert Muir)
 302
 303  * LUCENE-2939: Highlighter should try and use maxDocCharsToAnalyze in
 304    WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as
 305    when using CachingTokenStream. This can be a significant performance bug for
 306    large documents. (Mark Miller)
 307
 308  * LUCENE-3043: GermanStemmer threw IndexOutOfBoundsException if it encountered
 309    a zero-length token.  (Robert Muir)
 310
 311  * LUCENE-3044: ThaiWordFilter didn't reset its cached state correctly, this only
 312    caused a problem if you consumed a tokenstream, then reused it, added different
 313    attributes to it, and consumed it again.  (Robert Muir, Uwe Schindler)
 314
 315  * LUCENE-3113: Fixed some minor analysis bugs: double-reset() in ReusableAnalyzerBase
 316    and ShingleAnalyzerWrapper, missing end() implementations in PrefixAwareTokenFilter
 317    and PrefixAndSuffixAwareTokenFilter, invocations of incrementToken() after it
 318    already returned false in CommonGramsQueryFilter, HyphenatedWordsFilter,
 319    ShingleFilter, and SynonymsFilter.  (Robert Muir, Steven Rowe, Uwe Schindler)
 320
 321 New Features
 322
 323  * LUCENE-3016: Add analyzer for Latvian.  (Robert Muir)
 324
 325  * LUCENE-1421: create new grouping contrib module, enabling search
 326    results to be grouped by a single-valued indexed field.  This
 327    module was factored out of Solr's grouping implementation, but
 328    it cannot group by function queries nor arbitrary queries.  (Mike
 329    McCandless)
 330
 331  * LUCENE-3098: add AllGroupsCollector, to collect all unique groups
 332    (but in unspecified order).  (Martijn van Groningen via Mike
 333    McCandless)
 334
 335  * LUCENE-3092: Added NRTCachingDirectory in contrib/misc, which
 336    caches small segments in RAM.  This is useful, in the near-real-time
 337    case where the indexing rate is lowish but the reopen rate is
 338    highish, to take load off the IO system.  (Mike McCandless)
 339
 340 Optimizations
 341
 342  * LUCENE-3040: Switch all analysis consumers (highlighter, morelikethis, memory, ...)
 343    over to reusableTokenStream().  (Robert Muir)
 344
 345 ======================= Lucene 3.1.0 =======================
 346
 347 Changes in backwards compatibility policy
 348
 349  * LUCENE-2100: All Analyzers in Lucene-contrib have been marked as final.
 350    Analyzers should be only act as a composition of TokenStreams, users should
 351    compose their own analyzers instead of subclassing existing ones.
 352    (Simon Willnauer)
 353
 354  * LUCENE-2194, LUCENE-2201: Snowball APIs were upgraded to snowball revision
 355    502 (with some local modifications for improved performance).
 356    Index backwards compatibility and binary backwards compatibility is
 357    preserved, but some protected/public member variables changed type. This
 358    does NOT affect java code/class files produced by the snowball compiler,
 359    but technically is a backwards compatibility break.  (Robert Muir)
 360
 361  * LUCENE-2226: Moved contrib/snowball functionality into contrib/analyzers.
 362    Be sure to remove any old obselete lucene-snowball jar files from your
 363    classpath!  (Robert Muir)
 364
 365  * LUCENE-2323: Moved contrib/wikipedia functionality into contrib/analyzers.
 366    Additionally the package was changed from org.apache.lucene.wikipedia.analysis
 367    to org.apache.lucene.analysis.wikipedia.  (Robert Muir)
 368
 369  * LUCENE-2581: Added new methods to FragmentsBuilder interface. These methods
 370    are used to set pre/post tags and Encoder. (Koji Sekiguchi)
 371
 372  * LUCENE-2391: Improved spellchecker (re)build time/ram usage by omitting
 373    frequencies/positions/norms for single-valued fields, modifying the default
 374    ramBufferMBSize to match IndexWriterConfig (16MB), making index optimization
 375    an optional boolean parameter, and modifying the incremental update logic
 376    to work well with unoptimized spellcheck indexes. The indexDictionary() methods
 377    were made final to ensure a hard backwards break in case you were subclassing
 378    Spellchecker. In general, subclassing Spellchecker is not recommended.  (Robert Muir)
 379
 380 Changes in runtime behavior
 381
 382  * LUCENE-2117: SnowballAnalyzer uses TurkishLowerCaseFilter instead of
 383    LowercaseFilter to correctly handle the unique Turkish casing behavior if
 384    used with Version > 3.0 and the TurkishStemmer.
 385    (Robert Muir via Simon Willnauer)
 386
 387  * LUCENE-2055: GermanAnalyzer now uses the Snowball German2 algorithm and
 388    stopwords list by default for Version > 3.0.
 389    (Robert Muir, Uwe Schindler, Simon Willnauer)
 390
 391 Bug fixes
 392
 393  * LUCENE-2855: contrib queryparser was using CharSequence as key in some internal
 394    Map instances, which was leading to incorrect behavior, since some CharSequence
 395    implementors do not override hashcode and equals methods. Now the internal Maps
 396    are using String instead. (Adriano Crestani)
 397
 398  * LUCENE-2068: Fixed ReverseStringFilter which was not aware of supplementary
 399    characters. During reverse the filter created unpaired surrogates, which
 400    will be replaced by U+FFFD by the indexer, but not at query time. The filter
 401    now reverses supplementary characters correctly if used with Version > 3.0.
 402    (Simon Willnauer, Robert Muir)
 403
 404  * LUCENE-2035: TokenSources.getTokenStream() does not assign  positionIncrement.
 405    (Christopher Morris via Mark Miller)
 406
 407  * LUCENE-2055: Deprecated RussianTokenizer, RussianStemmer, RussianStemFilter,
 408    FrenchStemmer, FrenchStemFilter, DutchStemmer, and DutchStemFilter. For
 409    these Analyzers, SnowballFilter is used instead (for Version > 3.0), as
 410    the previous code did not always implement the Snowball algorithm correctly.
 411    Additionally, for Version > 3.0, the Snowball stopword lists are used by
 412    default.  (Robert Muir, Uwe Schindler, Simon Willnauer)
 413
 414  * LUCENE-2184: Fixed bug with handling best fit value when the proper best fit value is
 415    not an indexed field.  Note, this change affects the APIs. (Grant Ingersoll)
 416
 417  * LUCENE-2359: Fix bug in CartesianPolyFilterBuilder related to handling of behavior around
 418    the 180th meridian (Grant Ingersoll)
 419
 420  * LUCENE-2404: Fix bugs with position increment and empty tokens in ThaiWordFilter.
 421    For matchVersion >= 3.1 the filter also no longer lowercases. ThaiAnalyzer
 422    will use a separate LowerCaseFilter instead. (Uwe Schindler, Robert Muir)
 423
 424  * LUCENE-2615: Fix DirectIOLinuxDirectory to not assign bogus
 425    permissions to newly created files, and to not silently hardwire
 426    buffer size to 1 MB.  (Mark Miller, Robert Muir, Mike McCandless)
 427
 428  * LUCENE-2629: Fix gennorm2 task for generating ICUFoldingFilter's .nrm file. This allows
 429    you to customize its normalization/folding, by editing the source data files in src/data
 430    and regenerating a new .nrm with 'ant gennorm2'.  (David Bowen via Robert Muir)
 431
 432  * LUCENE-2653: ThaiWordFilter depends on the JRE having a Thai dictionary, which is not
 433    always the case. If the dictionary is unavailable, the filter will now throw
 434    UnsupportedOperationException in the constructor.  (Robert Muir)
 435
 436  * LUCENE-589: Fix contrib/demo for international documents.
 437    (Curtis d'Entremont via Robert Muir)
 438
 439  * LUCENE-2246: Fix contrib/demo for Turkish html documents.
 440    (Selim Nadi via Robert Muir)
 441
 442  * LUCENE-590: Demo HTML parser gives incorrect summaries when title is repeated as a heading
 443    (Curtis d'Entremont via Robert Muir)
 444
 445  * LUCENE-591: The demo indexer now indexes meta keywords.
 446    (Curtis d'Entremont via Robert Muir)
 447
 448  * LUCENE-2874: Highlighting overlapping tokens outputted doubled words.
 449    (Pierre Gossé via Robert Muir)
 450
 451  * LUCENE-2943: Fix thread-safety issues with ICUCollationKeyFilter.
 452    (Robert Muir)
 453
 454  * LUCENE-3087: Highlighter: fix case that was preventing highlighting
 455    of exact phrase when tokens overlap. (Pierre Gossé via Mike
 456    McCandless)
 457
 458 API Changes
 459
 460  * LUCENE-2867: Some contrib queryparser methods that receives CharSequence as
 461    identifier, such as QueryNode#unsetTag(CharSequence), were deprecated and
 462    will be removed on version 4. (Adriano Crestani)
 463
 464  * LUCENE-2147: Spatial GeoHashUtils now always decode GeoHash strings
 465    with full precision. GeoHash#decode_exactly(String) was merged into
 466    GeoHash#decode(String). (Chris Male, Simon Willnauer)
 467
 468  * LUCENE-2204: Change some package private classes/members to publicly accessible to implement
 469    custom FragmentsBuilders. (Koji Sekiguchi)
 470
 471  * LUCENE-2055: Integrate snowball into contrib/analyzers. SnowballAnalyzer is
 472    now deprecated in favor of language-specific analyzers which contain things
 473    such as stopword lists and any language-specific processing in addition to
 474    stemming. Add Turkish and Romanian stopwords lists to support this.
 475    (Robert Muir, Uwe Schindler, Simon Willnauer)
 476
 477  * LUCENE-2603: Add setMultiValuedSeparator(char) method to set an arbitrary
 478    char that is used when concatenating multiValued data. Default is a space
 479    (' '). It is applied on ANALYZED field only. (Koji Sekiguchi)
 480
 481  * LUCENE-2626: FastVectorHighlighter: enable FragListBuilder and FragmentsBuilder
 482    to be set per-field override. (Koji Sekiguchi)
 483
 484  * LUCENE-2712: FieldBoostMapAttribute in contrib/queryparser was changed from
 485    a Map<CharSequence,Float> to a Map<String,Float>. Per the CharSequence javadoc,
 486    CharSequence is inappropriate as a map key. (Robert Muir)
 487
 488  * LUCENE-1937: Add more methods to manipulate QueryNodeProcessorPipeline elements.
 489    QueryNodeProcessorPipeline now implements the List interface, this is useful
 490    if you want to extend or modify an existing pipeline. (Adriano Crestani via Robert Muir)
 491
 492  * LUCENE-2754, LUCENE-2757: Deprecated SpanRegexQuery. Use
 493    new SpanMultiTermQueryWrapper<RegexQuery>(new RegexQuery()) instead.
 494    (Robert Muir, Uwe Schindler)
 495
 496  * LUCENE-2747: Deprecated ArabicLetterTokenizer. StandardTokenizer now tokenizes
 497    most languages correctly including Arabic.  (Steven Rowe, Robert Muir)
 498
 499  * LUCENE-2830: Use StringBuilder instead of StringBuffer across Benchmark, and
 500    remove the StringBuffer HtmlParser.parse() variant. (Shai Erera)
 501
 502  * LUCENE-2920: Deprecated ShingleMatrixFilter as it is unmaintained and does
 503    not work with custom Attributes or custom payload encoders.  (Uwe Schindler)
 504
 505 New features
 506
 507  * LUCENE-2500: Added DirectIOLinuxDirectory, a Linux-specific
 508    Directory impl that uses the O_DIRECT flag to bypass the buffer
 509    cache.  This is useful to prevent segment merging from evicting
 510    pages from the buffer cache, since fadvise/madvise do not seem.
 511    (Michael McCandless)
 512
 513  * LUCENE-2306: Add NumericRangeFilter and NumericRangeQuery support to XMLQueryParser.
 514    (Jingkei Ly, via Mark Harwood)
 515
 516  * LUCENE-2102: Add a Turkish LowerCase Filter. TurkishLowerCaseFilter handles
 517    Turkish and Azeri unique casing behavior correctly.
 518    (Ahmet Arslan, Robert Muir via Simon Willnauer)
 519
 520  * LUCENE-2039: Add a extensible query parser to contrib/misc.
 521    ExtendableQueryParser enables arbitrary parser extensions based on a
 522    customizable field naming scheme.
 523    (Simon Willnauer)
 524
 525  * LUCENE-2067: Add a Czech light stemmer. CzechAnalyzer will now stem words
 526    when Version is set to 3.1 or higher.  (Robert Muir)
 527
 528  * LUCENE-2062: Add a Bulgarian analyzer.  (Robert Muir, Simon Willnauer)
 529
 530  * LUCENE-2206: Add Snowball's stopword lists for Danish, Dutch, English,
 531    Finnish, French, German, Hungarian, Italian, Norwegian, Russian, Spanish,
 532    and Swedish. These can be loaded with WordListLoader.getSnowballWordSet.
 533    (Robert Muir, Simon Willnauer)
 534
 535  * LUCENE-2243: Add DisjunctionMaxQuery support for FastVectorHighlighter.
 536    (Koji Sekiguchi)
 537
 538  * LUCENE-2218: ShingleFilter supports minimum shingle size, and the separator
 539    character is now configurable. Its also up to 20% faster.
 540    (Steven Rowe via Robert Muir)
 541
 542  * LUCENE-2234: Add a Hindi analyzer.  (Robert Muir)
 543
 544  * LUCENE-2055: Add analyzers/misc/StemmerOverrideFilter. This filter provides
 545    the ability to override any stemmer with a custom dictionary map.
 546    (Robert Muir, Uwe Schindler, Simon Willnauer)
 547
 548  * LUCENE-2399: Add ICUNormalizer2Filter, which normalizes tokens with ICU's
 549    Normalizer2. This allows for efficient combinations of normalization and custom
 550    mappings in addition to standard normalization, and normalization combined
 551    with unicode case folding.  (Robert Muir)
 552
 553  * LUCENE-1343: Add ICUFoldingFilter, a replacement for ASCIIFoldingFilter that
 554    does a more thorough job of normalizing unicode text for search.
 555    (Robert Haschart, Robert Muir)
 556
 557  * LUCENE-2409: Add ICUTransformFilter, which transforms text in a context
 558    sensitive way, either from ICU built-in rules (such as Traditional-Simplified),
 559    or from rules you write yourself.  (Robert Muir)
 560
 561  * LUCENE-2414: Add ICUTokenizer, a tailorable tokenizer that implements Unicode
 562    Text Segmentation. This tokenizer is useful for documents or collections with
 563    multiple languages.  The default configuration includes special support for
 564    Thai, Lao, Myanmar, and Khmer.  (Robert Muir, Uwe Schindler)
 565
 566  * LUCENE-2298: Add analyzers/stempel, an algorithmic stemmer with support for
 567    the Polish language.  (Andrzej Bialecki via Robert Muir)
 568
 569  * LUCENE-2400: ShingleFilter was changed to don't output all-filler shingles and
 570    unigrams, and uses a more performant algorithm to build grams using a linked list
 571    of AttributeSource.cloneAttributes() instances and the new copyTo() method.
 572    (Steven Rowe via Uwe Schindler)
 573
 574  * LUCENE-2437: Add an Analyzer for Indonesian.  (Robert Muir)
 575
 576  * LUCENE-2393: The HighFreqTerms tool (in misc) can now optionally
 577    also include the total termFreq.  (Tom Burton-West via Mike McCandless)
 578
 579  * LUCENE-2463: Add a Greek inflectional stemmer. GreekAnalyzer will now stem words
 580    when Version is set to 3.1 or higher.  (Robert Muir)
 581
 582  * LUCENE-1287: Allow usage of HyphenationCompoundWordTokenFilter without dictionary.
 583    (Thomas Peuss via Robert Muir)
 584
 585  * LUCENE-2464: FastVectorHighlighter: add SingleFragListBuilder to return
 586    entire field contents. (Koji Sekiguchi)
 587
 588  * LUCENE-2503: Added lighter stemming alternatives for European languages.
 589    (Robert Muir)
 590
 591  * LUCENE-2581: FastVectorHighlighter: add Encoder to FragmentsBuilder.
 592    (Koji Sekiguchi)
 593
 594  * LUCENE-2624: Add Analyzers for Armenian, Basque, and Catalan, from snowball.
 595    (Robert Muir)
 596
 597  * LUCENE-1938: PrecedenceQueryParser is now implemented with the flexible QP framework.
 598    This means that you can also add this functionality to your own QP pipeline by using
 599    BooleanModifiersQueryNodeProcessor, for example instead of GroupQueryNodeProcessor.
 600    (Adriano Crestani via Robert Muir)
 601
 602  * LUCENE-2791: Added WindowsDirectory, a Windows-specific Directory impl
 603    that doesn't synchronize on the file handle. This can be useful to
 604    avoid the performance problems of SimpleFSDirectory and NIOFSDirectory.
 605    (Robert Muir, Simon Willnauer, Uwe Schindler, Michael McCandless)
 606
 607  * LUCENE-2842: Add analyzer for Galician. Also adds the RSLP (Orengo) stemmer
 608    for Portuguese.  (Robert Muir)
 609
 610  * SOLR-1057: Add PathHierarchyTokenizer that represents file path hierarchies as synonyms of
 611    /something, /something/something, /something/something/else. (Ryan McKinley, Koji Sekiguchi)
 612
 613 Build
 614
 615  * LUCENE-2124: Moved the JDK-based collation support from contrib/collation
 616    into core, and moved the ICU-based collation support into contrib/icu.
 617    (Steven Rowe, Robert Muir)
 618
 619  * LUCENE-2323: Moved contrib/regex into contrib/queries. Moved the
 620    queryparsers under contrib/misc and contrib/surround into contrib/queryparser.
 621    Moved contrib/fast-vector-highlighter into contrib/highlighter.
 622    Moved ChainedFilter from contrib/misc to contrib/queries. contrib/spatial now
 623    depends on contrib/queries instead of contrib/misc.  (Robert Muir)
 624
 625  * LUCENE-2333: Fix failures during contrib builds, when classes in
 626    core were changed without ant clean. This fix also optimizes the
 627    dependency management between contribs by a new ANT macro.
 628    (Uwe Schindler, Shai Erera)
 629
 630  * LUCENE-2797: Upgrade contrib/icu's ICU jar file to ICU 4.6
 631    (Robert Muir)
 632
 633  * LUCENE-2833: Upgrade contrib/ant's jtidy jar file to r938 (Robert Muir)
 634
 635  * LUCENE-2413: Moved the demo out of lucene core and into contrib/demo.
 636    (Robert Muir)
 637
 638 Optimizations
 639
 640  * LUCENE-2157: DelimitedPayloadTokenFilter no longer copies the buffer
 641    over itsself. Instead it sets only the length. This patch also optimizes
 642    the logic of the filter and uses NIO for IdentityEncoder. (Uwe Schindler)
 643
 644  * LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
 645    directly, instead of Byte/CharBuffers, and modify ICUCollationKeyFilter to
 646    take advantage of this for faster performance.
 647    (Steven Rowe, Uwe Schindler, Robert Muir)
 648
 649  * LUCENE-2194, LUCENE-2201, LUCENE-2288: Snowball stemmers in contrib/analyzers
 650    have been optimized to work on char[] and remove unnecessary object creation.
 651    (Shai Erera, Robert Muir)
 652
 653  * LUCENE-2404: Improve performance of ThaiWordFilter by using a char[]-backed
 654    CharacterIterator (currently from javax.swing).  (Uwe Schindler, Robert Muir)
 655
 656 Test Cases
 657
 658  * LUCENE-2115: Cutover contrib tests to use Java5 generics.  (Kay Kay
 659    via Mike McCandless)
 660
 661 Other
 662
 663  * LUCENE-1845: Updated bdb-je jar from version 3.3.69 to 3.3.93.
 664    (Simon Willnauer via Mike McCandless)
 665
 666  * LUCENE-2415: Use reflection instead of a shim class to access Jakarta
 667    Regex prefix.  (Uwe Schindler)
 668
 669 ================== Release 2.9.4 / 3.0.3 ====================
 670
 671 Bug Fixes
 672
 673  * LUCENE-2277: QueryNodeImpl threw ConcurrentModificationException on
 674    add(List<QueryNode>). (Frank Wesemann via Robert Muir)
 675
 676  * LUCENE-2284: MatchAllDocsQueryNode toString() created an invalid XML tag.
 677    (Frank Wesemann via Robert Muir)
 678
 679  * LUCENE-2278: FastVectorHighlighter: Highlighted term is out of alignment
 680    in multi-valued NOT_ANALYZED field. (Koji Sekiguchi)
 681
 682  * LUCENE-2524: FastVectorHighlighter: use mod for getting colored tag.
 683    (Koji Sekiguchi)
 684
 685  * LUCENE-2616: FastVectorHighlighter: out of alignment when the first value is
 686    empty in multiValued field (Koji Sekiguchi)
 687
 688  * LUCENE-2731, LUCENE-2732: Fix (charset) problems in XML loading in
 689    HyphenationCompoundWordTokenFilter (partial bugfix-only in 2.9 and 3.0,
 690    full fix will be in later 3.1).
 691    (Uwe Schinder)
 692
 693 Documentation
 694
 695  * LUCENE-2055: Add documentation noting that the Dutch and French stemmers
 696    in contrib/analyzers do not implement the Snowball algorithm correctly,
 697    and recommend to use the equivalents in contrib/snowball if possible.
 698    (Robert Muir, Uwe Schindler, Simon Willnauer)
 699
 700  * LUCENE-2653: Add documentation noting that ThaiWordFilter will not work
 701    as expected on all JRE's. For example, on an IBM JRE, it does nothing.
 702    (Robert Muir)
 703
 704 ================== Release 2.9.3 / 3.0.2 ====================
 705
 706 No changes.
 707
 708 ================== Release 2.9.2 / 3.0.1 ====================
 709
 710 New features
 711
 712  * LUCENE-2108: Spellchecker now safely supports concurrent modifications to
 713    the spell-index. Threads can safely obtain term suggestions while the spell-
 714    index is rebuild, cleared or reset. Internal IndexSearcher instances remain
 715    open until the last thread accessing them releases the reference.
 716    (Simon Willnauer)
 717
 718 Bug Fixes
 719
 720  * LUCENE-2144: Fix InstantiatedIndex to handle termDocs(null)
 721    correctly (enumerate all non-deleted docs).  (Karl Wettin via Mike
 722    McCandless)
 723
 724  * LUCENE-2199: ShingleFilter skipped over tri-gram shingles if outputUnigram
 725    was set to false. (Simon Willnauer)
 726
 727  * LUCENE-2211: Fix missing clearAttributes() calls:
 728    ShingleMatrix, PrefixAware, compounds, NGramTokenFilter,
 729    EdgeNGramTokenFilter, Highlighter, and MemoryIndex.
 730    (Uwe Schindler, Robert Muir)
 731
 732  * LUCENE-2207, LUCENE-2219: Fix incorrect offset calculations in end() for
 733    CJKTokenizer, ChineseTokenizer, SmartChinese SentenceTokenizer,
 734    and WikipediaTokenizer.  (Koji Sekiguchi, Robert Muir)
 735
 736  * LUCENE-2266: Fixed offset calculations in NGramTokenFilter and
 737    EdgeNGramTokenFilter.  (Joe Calderon, Robert Muir via Uwe Schindler)
 738
 739 API Changes
 740
 741  * LUCENE-2108: Add SpellChecker.close, to close the underlying
 742    reader.  (Eirik Bjørsnøs via Mike McCandless)
 743
 744  * LUCENE-2165: Add a constructor to SnowballAnalyzer that takes a Set of
 745    stopwords, and deprecate the String[] one.  (Nick Burch via Robert Muir)
 746
 747 ======================= Release 3.0.0 =======================
 748
 749 Changes in backwards compatibility policy
 750
 751  * LUCENE-1257: Change some occurences of StringBuffer in public/protected
 752    APIs of contrib/surround to StringBuilder.
 753    (Paul Elschot via Uwe Schindler)
 754
 755 Changes in runtime behavior
 756
 757  * LUCENE-1966: Modified and cleaned the default Arabic stopwords list used
 758    by ArabicAnalyzer. You'll need to fully re-index any previously created
 759    indexes.  (Basem Narmok via Robert Muir)
 760
 761 API Changes
 762
 763  * LUCENE-1936: Deprecated RussianLowerCaseFilter, because it transforms
 764    text exactly the same as LowerCaseFilter. Please use LowerCaseFilter
 765    instead, which has the same functionality.  (Robert Muir)
 766
 767  * LUCENE-2051: Contrib Analyzer setters were deprecated and replaced
 768    with ctor arguments / Version number.  Also stop word lists
 769    were unified.  (Simon Willnauer)
 770
 771 Bug fixes
 772
 773  * LUCENE-1781: Fixed various issues with the lat/lng bounding box
 774    distance filter created for radius search in contrib/spatial.
 775    (Bill Bell via Mike McCandless)
 776
 777  * LUCENE-1939: IndexOutOfBoundsException at ShingleMatrixFilter's
 778    Iterator#hasNext method on exhausted streams.
 779    (Patrick Jungermann via Karl Wettin)
 780
 781  * LUCENE-1359: French analyzer did not support null field names.
 782    (Andrew Lynch via Robert Muir)
 783
 784 New features
 785
 786  * LUCENE-1924: Added BalancedSegmentMergePolicy to contrib/misc,
 787    which is a merge policy that tries to avoid doing very large
 788    segment merges to give better search performance in a mixed
 789    indexing/searching environment.  (John Wang via Mike McCandless)
 790
 791  * LUCENE-1959: Add index splitting tools. The IndexSplitter tool works
 792    on multi-segment (non optimized) indexes and it can copy specific
 793    segments out of the index into a new index.  It can also list the
 794    segments in the index, and delete specified segments.  (Jason Rutherglen via
 795    Mike McCandless). MultiPassIndexSplitter can split any index into
 796    any number of output parts, at the cost of doing multiple passes over
 797    the input index. (Andrzej Bialecki)
 798
 799  * LUCENE-1993: Add maxDocFreq setting to MoreLikeThis, to exclude
 800    from consideration terms that match more than the specified number
 801    of documents.  (Christian Steinert via Mike McCandless)
 802
 803 Optimizations
 804
 805  * LUCENE-1965, LUCENE-1962: Arabic-, Persian- and SmartChineseAnalyzer
 806    loads default stopwords only once if accessed for the first time.
 807    Previous versions were loading the stopword files each time a new
 808    instance was created. This might improve performance for applications
 809    creating lots of instances of these Analyzers. (Simon Willnauer)
 810
 811 Documentation
 812
 813  * LUCENE-1916: Translated documentation in the smartcn hhmm package.
 814    (Patricia Peng via Robert Muir)
 815
 816 Build
 817
 818  * LUCENE-1904: Moved wordnet-based synonym support from contrib/memory
 819    into contrib/wordnet.  (Robert Muir)
 820
 821  * LUCENE-2031: Moved PatternAnalyzer from contrib/memory into
 822    contrib/analyzers/common, under miscellaneous.  (Robert Muir)
 823
 824 ======================= Release 2.9.1 =======================
 825
 826 Changes in backwards compatibility policy
 827
 828  * LUCENE-2002: Add required Version matchVersion argument when
 829    constructing ComplexPhraseQueryParser and default (as of 2.9)
 830    enablePositionIncrements to true to match StandardAnalyzer's
 831    default.  Also added required matchVersion to most of the analyzers
 832    (Uwe Schindler, Mike McCandless)
 833
 834 Changes in runtime behavior
 835
 836  * LUCENE-1963: ArabicAnalyzer now lowercases before checking the stopword
 837    list. This has no effect on Arabic text, but if you are using a custom
 838    stopword list that contains some non-Arabic words, you'll need to fully
 839    reindex.  (DM Smith via Robert Muir)
 840
 841 Bug fixes
 842
 843  * LUCENE-1953: FastVectorHighlighter: small fragCharSize can cause
 844    StringIndexOutOfBoundsException. (Koji Sekiguchi)
 845
 846  * LUCENE-1929: Highlighter throws exception on NumericRangeQuery and does not
 847    support deprecated RangeQuery.  (Mark Miller)
 848
 849  * LUCENE-2001: Wordnet Syns2Index incorrectly parses synonyms that
 850    contain a single quote. (Parag H. Dave via Robert Muir)
 851
 852  * LUCENE-2003: Highlighter doesn't respect position increments other than 1 with
 853    PhraseQuerys. (Uwe Schindler, Mark Miller)
 854
 855  * LUCENE-1954: InstantiatedIndexWriter: Fixed ClassCastException with
 856    NumericField because of incorrect unchecked cast: Document.getFields()
 857    returns List<Fieldable>.  (Bernd Fondermann via Uwe Schindler)
 858
 859  * LUCENE-2014: SmartChineseAnalyzer did not properly clear attributes
 860    in WordTokenFilter. If enablePositionIncrements is set for StopFilter,
 861    then this could create invalid position increments, causing IndexWriter
 862    to crash.  (Robert Muir, Uwe Schindler)
 863
 864  * LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
 865    (Benjamin Keil via Mark Miller)
 866
 867 ======================= Release 2.9.0 =======================
 868
 869 Changes in runtime behavior
 870
 871  * LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all
 872     number conversion.  You'll need to fully re-index any previously created indexes.
 873     This isn't a break in back-compatibility because local Lucene has not yet
 874     been released.  (Mike McCandless)
 875
 876  * LUCENE-1758: ArabicAnalyzer now uses the light10 algorithm, has a refined
 877     default stopword list, and lowercases non-Arabic text.
 878     You'll need to fully re-index any previously created indexes. This isn't a
 879     break in back-compatibility because ArabicAnalyzer has not yet been
 880     released.  (Robert Muir)
 881
 882
 883 API Changes
 884
 885  * LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards
 886     compatibility with some public classes. If you have implemented custom Fragmenters or Scorers,
 887     you will need to adjust them to work with the new TokenStream API. Rather than getting passed a
 888     Token at a time, you will be given a TokenStream to init your impl with - store the Attributes
 889     you are interested in locally and access them on each call to the method that used to pass a new
 890     Token. Look at the included updated impls for examples.  (Mark Miller)
 891
 892  * LUCENE-1460: Change contrib TokenStreams/Filters to use the new
 893     TokenStream API. (Robert Muir, Michael Busch)
 894
 895  * LUCENE-1775, LUCENE-1903: Change remaining TokenFilters (shingle, prefix-suffix)
 896     to use the new TokenStream API. ShingleFilter is much more efficient now,
 897     it clones much less often and computes the tokens mostly on the fly now.
 898     Also added more tests. (Robert Muir, Michael Busch, Uwe Schindler, Chris Harris)
 899
 900  * LUCENE-1685: The position aware SpanScorer has become the default scorer
 901     for Highlighting. The SpanScorer implementation has replaced QueryScorer
 902     and the old term highlighting QueryScorer has been renamed to
 903     QueryTermScorer. Multi-term queries are also now expanded by default. If
 904     you were previously rewriting the query for multi-term query highlighting,
 905     you should no longer do that (unless you switch to using QueryTermScorer).
 906     The SpanScorer API (now QueryScorer) has also been improved to more closely
 907     match the API of the previous QueryScorer implementation.  (Mark Miller)
 908
 909  * LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
 910     Analyzers. If you need to index text in these encodings, please use Java's
 911     character set conversion facilities (InputStreamReader, etc) during I/O,
 912     so that Lucene can analyze this text as Unicode instead.  (Robert Muir)
 913
 914 Bug fixes
 915
 916  * LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index.
 917     (Karl Wettin)
 918
 919  * LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the
 920     same way IndexWriter does. Parts of InstantiatedIndex was not Serializable.
 921     (Karl Wettin)
 922
 923  * LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index.
 924     (Karl Wettin, Robert Newson)
 925
 926  * LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException
 927     due to recursive invocation. (Karl Wettin)
 928
 929  * LUCENE-1548: Fix distance normalization in LevenshteinDistance to
 930     not produce negative distances (Thomas Morton via Mike McCandless)
 931
 932  * LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS
 933     characters to only apply to the correct subset (Daniel Cheng via
 934     Mike McCandless)
 935
 936  * LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after
 937     StandardTokenizer so that stop words with mixed case are filtered
 938     out.  (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless)
 939
 940  * LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size.
 941     (Todd Teak via Otis Gospodnetic)
 942
 943  * LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by
 944     RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so
 945     that the regexp must match the entire string, not just a prefix.
 946     (Trejkaz via Mike McCandless)
 947
 948  * LUCENE-1792: Fix new query parser to set rewrite method for
 949     multi-term queries. (Luis Alves, Mike McCandless via Michael Busch)
 950
 951  * LUCENE-1828: Fix memory index to call TokenStream.reset() and
 952     TokenStream.end(). (Tim Smith via Michael Busch)
 953
 954  * LUCENE-1912: Fix fast-vector-highlighter issue when two or more
 955    terms are concatenated (Koji Sekiguchi via Mike McCandless)
 956
 957 New features
 958
 959  * LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin)
 960
 961  * LUCENE-1435: Added contrib/collation, a CollationKeyFilter
 962     allowing you to convert tokens into CollationKeys encoded using
 963     IndexableBinaryStringTools.  This allows for faster RangeQuery when
 964     a field needs to use a custom Collator.  (Steven Rowe via Mike
 965     McCandless)
 966
 967  * LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now
 968     read/write bz2 using Apache commons compress library.  This means
 969     you can download the .bz2 export from http://wikipedia.org and
 970     immediately index it.  (Shai Erera via Mike McCandless)
 971
 972  * LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers.  It
 973     improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese
 974     sentences properly.  SmartChineseAnalyzer uses a Hidden Markov
 975     Model to tokenize Chinese words in a more intelligent way.
 976     (Xiaoping Gao via Mike McCandless)
 977
 978  * LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll)
 979
 980  * LUCENE-1578: Support for loading unoptimized readers to the
 981     constructor of InstantiatedIndex. (Karl Wettin)
 982
 983  * LUCENE-1704: Allow specifying the Tidy configuration file when
 984     parsing HTML docs with contrib/ant.  (Keith Sprochi via Mike
 985     McCandless)
 986
 987  * LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative
 988     highlighter.  (Koji Sekiguchi via Mike McCandless)
 989
 990  * LUCENE-1740: Added "analyzer" command to Lucli, enabling changing
 991     the analyzer from the default StandardAnalyzer.  (Bernd Fondermann
 992     via Mike McCandless)
 993
 994  * LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan
 995     Leibiusky via Mike McCandless)
 996
 997  * LUCENE-1745: Added constructors to JakartaRegexpCapabilities and
 998     JavaUtilRegexCapabilities as well as static flags to support
 999     configuring a RegexCapabilities implementation with the
1000     implementation-specific modifier flags. Allows for callers to
1001     customize the RegexQuery using the implementation-specific options
1002     and fine tune how regular expressions are compiled and
1003     matched. (Marc Zampetti zampettim@aim.com via Mike McCandless)
1004
1005  * LUCENE-1567: Added a new QueryParser framework, that allows
1006     implementing a new query syntax in a flexible and efficient way.
1007     This new QueryParser will be moved to Lucene's core in release
1008     3.0 and will then replace the current core QueryParser, which
1009     has been deprecated with this patch.
1010     (Luis Alves and Adriano Campos via Michael Busch)
1011
1012  * LUCENE-1486: Added ComplexPhraseQueryParser, an extension of QueryParser
1013     that allows a subset of the Lucene query language to be embedded in
1014     PhraseQuerys. Wildcard, Range, and Fuzzy queries, as well as limited
1015     boolean logic, can be used within quote operators with this parser, ie:
1016     "(jo* -john) smyth~". (Mark Harwood via Mark Miller)
1017
1018  * Added web-based demo of functionality in contrib's XML Query Parser
1019     packaged as War file (Mark Harwood)
1020
1021  * LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)
1022
1023  * LUCENE-1628: Added Persian analyzer.  (Robert Muir)
1024
1025  * LUCENE-1813: Add option to ReverseStringFilter to mark reversed tokens.
1026     (Andrzej Bialecki via Robert Muir)
1027
1028 Optimizations
1029
1030  * LUCENE-1643: Re-use the collation key (RawCollationKey) for
1031      better performance, in ICUCollationKeyFilter.  (Robert Muir via
1032      Mike McCandless)
1033
1034  * LUCENE-1794: Implement TokenStream reuse for contrib Analyzers,
1035      and implement reset() for TokenStreams to support reuse.  (Robert Muir)
1036
1037 Documentation
1038
1039  * LUCENE-1876: added missing package level documentation for numerous
1040      contrib packages.
1041      (Steven Rowe & Robert Muir)
1042
1043 Build
1044
1045  * LUCENE-1728: Split contrib/analyzers into common and smartcn modules.
1046    Contrib/analyzers now builds an additional lucene-smartcn Jar file. All
1047    smartcn classes are not included in the lucene-analyzers JAR file.
1048    (Robert Muir via Simon Willnauer)
1049
1050  * LUCENE-1829: Fix contrib query parser to properly create javacc files.
1051    (Jan-Pascal and Luis Alves via Michael Busch)
1052
1053 Test Cases
1054
1055
1056 ======================= Release 2.4.0 =======================
1057
1058 Changes in runtime behavior
1059
1060  (None)
1061
1062 API Changes
1063
1064  1.
1065
1066  (None)
1067
1068 Bug fixes
1069
1070  1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames()
1071     and tests that assert that deleted documents behaves as they should (they did).
1072     (Jason Rutherglen, Karl Wettin)
1073
1074  2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat
1075     the array offset right. (Jason Rutherglen via Karl Wettin)
1076
1077 New features
1078
1079  1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin)
1080
1081  2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500.
1082     Introducing Hungarian, Turkish and Romanian support, updated older stemmers
1083     and optimized (reflectionless) SnowballFilter.
1084     IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older)
1085     might not be compatible with these updated classes as some algorithms have changed.
1086     (Karl Wettin)
1087
1088  3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors
1089     or by resolving the inverted index. (Karl Wettin)
1090
1091 Documentation
1092
1093  (None)
1094
1095 Build
1096
1097  (None)
1098
1099 Test Cases
1100
1101  (None)