lucene-java-3.4.0/lucene/contrib/CHANGES.txt

   1 Lucene contrib change Log
   2
   3 For more information on past and future Lucene versions, please see:
   4 http://s.apache.org/luceneversions
   5
   6 ======================= Lucene 3.4.0 =======================
   7
   8 New Features
   9
  10  * LUCENE-3234: provide a limit on phrase analysis in FastVectorHighlighter for
  11    highlighting speed up. Use FastVectorHighlighter.setPhraseLimit() to set limit
  12    (e.g. 5000). (Mike Sokolov via Koji Sekiguchi)
  13
  14  * LUCENE-3079: a new facet module which provides faceted indexing & search
  15    capabilities. It allows managing a taxonomy of categories, and index them
  16    with documents. It also provides search API for aggregating (e.g. count)
  17    the weights of the categories that are relevant to the search results.
  18    (Shai Erera)
  19
  20  * LUCENE-3171: Added BlockJoinQuery and BlockJoinCollector, under the
  21    new contrib/join module, to enable searches that require joining
  22    between parent and child documents.  Joined (children + parent)
  23    documents must be indexed as a document block, using
  24    IndexWriter.add/UpdateDocuments (Mark Harwood, Mike McCandless)
  25
  26  * LUCENE-3233, LUCENE-3375: Added SynonymFilter for applying multi-word synonyms
  27    during indexing or querying (with parsers for wordnet and solr formats).
  28    Removed contrib/wordnet.  (Simon Rosenthal, Robert Muir, Mike McCandless)
  29
  30  * LUCENE-1768: added support for numeric ranges in contrib query parser;
  31    added support for simple numeric queries, such as <age:4>, in contrib
  32    query parser (Vinicius Barros via Uwe Schindler)
  33
  34 Changes in runtime behavior:
  35
  36  * LUCENE-1768: StandardQueryConfigHandler now uses NumericFieldConfigListener
  37    to set a NumericConfig to its corresponding FieldConfig;
  38    StandardQueryTreeBuilder now uses DummyQueryNodeBuilder for
  39    NumericQueryNodes and uses NumericRangeQueryNodeBuilder for
  40    NumericRangeQueryNodes; StandardQueryNodeProcessorPipeline now executes
  41    NumericQueryNodeProcessor followed by NumericRangeQueryNodeProcessor
  42    right after LowercaseExpandedTermsQueryNodeProcessor
  43    (Vinicius Barros via Uwe Schindler)
  44
  45 API Changes
  46
  47  * LUCENE-3296: PKIndexSplitter & MultiPassIndexSplitter now have version
  48    constructors. PKIndexSplitter accepts a IndexWriterConfig for each of
  49    the target indexes. (Simon Willnauer, Jason Rutherglen)
  50
  51  * LUCENE-2979: queryparser configuration API located under
  52    org.apache.lucene.queryParser.core.config has been simplified and
  53    Attribute objects no longer should be used to configure query parsers. Now
  54    any configuration should be done through AbstractQueryConfig's set and get
  55    methods. The old API, which uses Attributes objects, is still in place, however
  56    it has been deprecated and will be removed soon.
  57    (Phillipe Ramalho via Adriano Crestani)
  58
  59  * LUCENE-3400: Deprecated DutchAnalyzer.setStemDictionary since it prevents
  60    TokenStream reuse (Chris Male)
  61
  62  * LUCENE-1768: setNumericConfigMap and getNumericConfigMap were added
  63    to StandardQueryParser; ParametricRangeQueryNode and
  64    oal.queryParser.standard.nodes.RangeQueryNode now implement
  65    oal.queryParser.core.nodes.RangeQueryNode;
  66    oal.queryParser.core.nodes.RangeQueryNode was deprecated and now extends
  67    TermRangeQueryNode, which extends AbstractRangeQueryNode;
  68    ParametricQueryNode was deprecated; FieldQueryNode now implements the
  69    new FieldValueQueryNode<CharSequence>, which this last one implements
  70    FieldableQueryNode and thew new ValueQueryNode
  71    (Vinicius Barros via Uwe Schindler)
  72
  73 Optimizations
  74
  75  * LUCENE-3306: Disabled indexing of positions for spellchecker n-gram
  76    fields: they are not needed because the spellchecker does not
  77    use positional queries.  (Robert Muir)
  78
  79 Bug Fixes
  80
  81  * LUCENE-3326: Fixed bug if you used MoreLikeThis.like(Reader), it would
  82    try to re-analyze the same Reader multiple times, passing different
  83    field names to the analyzer. Additionally MoreLikeThisQuery would take
  84    your string and encode/decode it using the default charset, it now uses
  85    a StringReader.  Finally, MoreLikeThis's methods that take File, URL, InputStream,
  86    are deprecated, please create the Reader yourself. (Trejkaz, Robert Muir)
  87
  88  * LUCENE-3347: XML query parser did not always incorporate boosts from
  89    UserQuery elements.  (Moogie, Uwe Schindler)
  90
  91  * LUCENE-3382: Fixed a bug where NRTCachingDirectory's listAll() would wrongly
  92    throw NoSuchDirectoryException when all files written so far have been
  93    cached to RAM and the directory still has not yet been created on the
  94    filesystem.  (Robert Muir)
  95
  96 ======================= Lucene 3.3.0 =======================
  97
  98 New Features
  99
 100  * LUCENE-152: Add KStem (light stemmer for English).
 101    (Yonik Seeley via Robert Muir)
 102
 103  * LUCENE-3135: Add suggesters (autocomplete) to contrib/spellchecker,
 104    with three implementations: Jaspell, Ternary Trie, and Finite State.
 105    (Andrzej Bialecki, Dawid Weiss, Mike Mccandless, Robert Muir)
 106
 107  * LUCENE-3129: Added BlockGroupingCollector, a single pass
 108    grouping collector which is faster than the two-pass approach, and
 109    also computes the total group count, but requires that every
 110    document sharing the same group was indexed as a doc block
 111    (IndexWriter.add/updateDocuments).  (Mike McCandless)
 112
 113  * LUCENE-2955: Added NRTManager and NRTManagerReopenThread, to
 114    simplify handling NRT reopen with multiple search threads, and to
 115    allow an app to control which indexing changes must be visible to
 116    which search requests.  (Mike McCandless)
 117
 118  * LUCENE-3191: Added SearchGroup.merge and TopGroups.merge, to
 119    facilitate doing grouping in a distributed environment (Uwe
 120    Schindler, Mike McCandless)
 121
 122  * LUCENE-2919: Added PKIndexSplitter, that splits an index according
 123    to a middle term in a specified field.  (Jason Rutherglen via Mike
 124    McCandless, Uwe Schindler)
 125
 126 API Changes
 127
 128  * LUCENE-3141: add getter method to access fragInfos in FieldFragList.
 129    (Sujit Pal via Koji Sekiguchi)
 130
 131  * LUCENE-3099: Allow subclasses to determine the group value for
 132    First/SecondPassGroupingCollector.  (Martijn van Groningen, Mike
 133    McCandless)
 134
 135 Bug Fixes
 136
 137  * LUCENE-3185: Fix bug in NRTCachingDirectory.deleteFile that would
 138    always throw exception and sometimes fail to actually delete the
 139    file.  (Mike McCandless)
 140
 141  * LUCENE-3188: contrib/misc IndexSplitter creates indexes with incorrect
 142    SegmentInfos.counter; added CheckIndex check & fix for this problem.
 143    (Ivan Dimitrov Vasilev via Steve Rowe)
 144
 145 Build
 146
 147  * LUCENE-3149: Upgrade contrib/icu's ICU jar file to ICU 4.8.
 148    (Robert Muir)
 149
 150 ======================= Lucene 3.2.0 =======================
 151
 152 Changes in backwards compatibility policy
 153
 154  * LUCENE-2981: Removed the following contribs: ant, db, lucli, swing. (Robert Muir)
 155
 156 Changes in runtime behavior
 157
 158  * LUCENE-3086: ItalianAnalyzer now uses ElisionFilter with a set of Italian
 159    contractions by default.  (Robert Muir)
 160
 161 Bug Fixes
 162
 163  * LUCENE-3045: fixed QueryNodeImpl.containsTag(String key) that was
 164    not lowercasing the key before checking for the tag (Adriano Crestani)
 165
 166  * LUCENE-3026: SmartChineseAnalyzer's WordTokenFilter threw NullPointerException
 167    on sentences longer than 32,767 characters.  (wangzhenghang via Robert Muir)
 168
 169  * LUCENE-2939: Highlighter should try and use maxDocCharsToAnalyze in
 170    WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as
 171    when using CachingTokenStream. This can be a significant performance bug for
 172    large documents. (Mark Miller)
 173
 174  * LUCENE-3043: GermanStemmer threw IndexOutOfBoundsException if it encountered
 175    a zero-length token.  (Robert Muir)
 176
 177  * LUCENE-3044: ThaiWordFilter didn't reset its cached state correctly, this only
 178    caused a problem if you consumed a tokenstream, then reused it, added different
 179    attributes to it, and consumed it again.  (Robert Muir, Uwe Schindler)
 180
 181  * LUCENE-3113: Fixed some minor analysis bugs: double-reset() in ReusableAnalyzerBase
 182    and ShingleAnalyzerWrapper, missing end() implementations in PrefixAwareTokenFilter
 183    and PrefixAndSuffixAwareTokenFilter, invocations of incrementToken() after it
 184    already returned false in CommonGramsQueryFilter, HyphenatedWordsFilter,
 185    ShingleFilter, and SynonymsFilter.  (Robert Muir, Steven Rowe, Uwe Schindler)
 186
 187 New Features
 188
 189  * LUCENE-3016: Add analyzer for Latvian.  (Robert Muir)
 190
 191  * LUCENE-1421: create new grouping contrib module, enabling search
 192    results to be grouped by a single-valued indexed field.  This
 193    module was factored out of Solr's grouping implementation, but
 194    it cannot group by function queries nor arbitrary queries.  (Mike
 195    McCandless)
 196
 197  * LUCENE-3098: add AllGroupsCollector, to collect all unique groups
 198    (but in unspecified order).  (Martijn van Groningen via Mike
 199    McCandless)
 200
 201  * LUCENE-3092: Added NRTCachingDirectory in contrib/misc, which
 202    caches small segments in RAM.  This is useful, in the near-real-time
 203    case where the indexing rate is lowish but the reopen rate is
 204    highish, to take load off the IO system.  (Mike McCandless)
 205
 206 Optimizations
 207
 208  * LUCENE-3040: Switch all analysis consumers (highlighter, morelikethis, memory, ...)
 209    over to reusableTokenStream().  (Robert Muir)
 210
 211 ======================= Lucene 3.1.0 =======================
 212
 213 Changes in backwards compatibility policy
 214
 215  * LUCENE-2100: All Analyzers in Lucene-contrib have been marked as final.
 216    Analyzers should be only act as a composition of TokenStreams, users should
 217    compose their own analyzers instead of subclassing existing ones.
 218    (Simon Willnauer)
 219
 220  * LUCENE-2194, LUCENE-2201: Snowball APIs were upgraded to snowball revision
 221    502 (with some local modifications for improved performance).
 222    Index backwards compatibility and binary backwards compatibility is
 223    preserved, but some protected/public member variables changed type. This
 224    does NOT affect java code/class files produced by the snowball compiler,
 225    but technically is a backwards compatibility break.  (Robert Muir)
 226
 227  * LUCENE-2226: Moved contrib/snowball functionality into contrib/analyzers.
 228    Be sure to remove any old obselete lucene-snowball jar files from your
 229    classpath!  (Robert Muir)
 230
 231  * LUCENE-2323: Moved contrib/wikipedia functionality into contrib/analyzers.
 232    Additionally the package was changed from org.apache.lucene.wikipedia.analysis
 233    to org.apache.lucene.analysis.wikipedia.  (Robert Muir)
 234
 235  * LUCENE-2581: Added new methods to FragmentsBuilder interface. These methods
 236    are used to set pre/post tags and Encoder. (Koji Sekiguchi)
 237
 238  * LUCENE-2391: Improved spellchecker (re)build time/ram usage by omitting
 239    frequencies/positions/norms for single-valued fields, modifying the default
 240    ramBufferMBSize to match IndexWriterConfig (16MB), making index optimization
 241    an optional boolean parameter, and modifying the incremental update logic
 242    to work well with unoptimized spellcheck indexes. The indexDictionary() methods
 243    were made final to ensure a hard backwards break in case you were subclassing
 244    Spellchecker. In general, subclassing Spellchecker is not recommended.  (Robert Muir)
 245
 246 Changes in runtime behavior
 247
 248  * LUCENE-2117: SnowballAnalyzer uses TurkishLowerCaseFilter instead of
 249    LowercaseFilter to correctly handle the unique Turkish casing behavior if
 250    used with Version > 3.0 and the TurkishStemmer.
 251    (Robert Muir via Simon Willnauer)
 252
 253  * LUCENE-2055: GermanAnalyzer now uses the Snowball German2 algorithm and
 254    stopwords list by default for Version > 3.0.
 255    (Robert Muir, Uwe Schindler, Simon Willnauer)
 256
 257 Bug fixes
 258
 259  * LUCENE-2855: contrib queryparser was using CharSequence as key in some internal
 260    Map instances, which was leading to incorrect behavior, since some CharSequence
 261    implementors do not override hashcode and equals methods. Now the internal Maps
 262    are using String instead. (Adriano Crestani)
 263
 264  * LUCENE-2068: Fixed ReverseStringFilter which was not aware of supplementary
 265    characters. During reverse the filter created unpaired surrogates, which
 266    will be replaced by U+FFFD by the indexer, but not at query time. The filter
 267    now reverses supplementary characters correctly if used with Version > 3.0.
 268    (Simon Willnauer, Robert Muir)
 269
 270  * LUCENE-2035: TokenSources.getTokenStream() does not assign  positionIncrement.
 271    (Christopher Morris via Mark Miller)
 272
 273  * LUCENE-2055: Deprecated RussianTokenizer, RussianStemmer, RussianStemFilter,
 274    FrenchStemmer, FrenchStemFilter, DutchStemmer, and DutchStemFilter. For
 275    these Analyzers, SnowballFilter is used instead (for Version > 3.0), as
 276    the previous code did not always implement the Snowball algorithm correctly.
 277    Additionally, for Version > 3.0, the Snowball stopword lists are used by
 278    default.  (Robert Muir, Uwe Schindler, Simon Willnauer)
 279
 280  * LUCENE-2184: Fixed bug with handling best fit value when the proper best fit value is
 281    not an indexed field.  Note, this change affects the APIs. (Grant Ingersoll)
 282
 283  * LUCENE-2359: Fix bug in CartesianPolyFilterBuilder related to handling of behavior around
 284    the 180th meridian (Grant Ingersoll)
 285
 286  * LUCENE-2404: Fix bugs with position increment and empty tokens in ThaiWordFilter.
 287    For matchVersion >= 3.1 the filter also no longer lowercases. ThaiAnalyzer
 288    will use a separate LowerCaseFilter instead. (Uwe Schindler, Robert Muir)
 289
 290  * LUCENE-2615: Fix DirectIOLinuxDirectory to not assign bogus
 291    permissions to newly created files, and to not silently hardwire
 292    buffer size to 1 MB.  (Mark Miller, Robert Muir, Mike McCandless)
 293
 294  * LUCENE-2629: Fix gennorm2 task for generating ICUFoldingFilter's .nrm file. This allows
 295    you to customize its normalization/folding, by editing the source data files in src/data
 296    and regenerating a new .nrm with 'ant gennorm2'.  (David Bowen via Robert Muir)
 297
 298  * LUCENE-2653: ThaiWordFilter depends on the JRE having a Thai dictionary, which is not
 299    always the case. If the dictionary is unavailable, the filter will now throw
 300    UnsupportedOperationException in the constructor.  (Robert Muir)
 301
 302  * LUCENE-589: Fix contrib/demo for international documents.
 303    (Curtis d'Entremont via Robert Muir)
 304
 305  * LUCENE-2246: Fix contrib/demo for Turkish html documents.
 306    (Selim Nadi via Robert Muir)
 307
 308  * LUCENE-590: Demo HTML parser gives incorrect summaries when title is repeated as a heading
 309    (Curtis d'Entremont via Robert Muir)
 310
 311  * LUCENE-591: The demo indexer now indexes meta keywords.
 312    (Curtis d'Entremont via Robert Muir)
 313
 314  * LUCENE-2874: Highlighting overlapping tokens outputted doubled words.
 315    (Pierre Gossé via Robert Muir)
 316
 317  * LUCENE-2943: Fix thread-safety issues with ICUCollationKeyFilter.
 318    (Robert Muir)
 319
 320  * LUCENE-3087: Highlighter: fix case that was preventing highlighting
 321    of exact phrase when tokens overlap. (Pierre Gossé via Mike
 322    McCandless)
 323
 324 API Changes
 325
 326  * LUCENE-2867: Some contrib queryparser methods that receives CharSequence as
 327    identifier, such as QueryNode#unsetTag(CharSequence), were deprecated and
 328    will be removed on version 4. (Adriano Crestani)
 329
 330  * LUCENE-2147: Spatial GeoHashUtils now always decode GeoHash strings
 331    with full precision. GeoHash#decode_exactly(String) was merged into
 332    GeoHash#decode(String). (Chris Male, Simon Willnauer)
 333
 334  * LUCENE-2204: Change some package private classes/members to publicly accessible to implement
 335    custom FragmentsBuilders. (Koji Sekiguchi)
 336
 337  * LUCENE-2055: Integrate snowball into contrib/analyzers. SnowballAnalyzer is
 338    now deprecated in favor of language-specific analyzers which contain things
 339    such as stopword lists and any language-specific processing in addition to
 340    stemming. Add Turkish and Romanian stopwords lists to support this.
 341    (Robert Muir, Uwe Schindler, Simon Willnauer)
 342
 343  * LUCENE-2603: Add setMultiValuedSeparator(char) method to set an arbitrary
 344    char that is used when concatenating multiValued data. Default is a space
 345    (' '). It is applied on ANALYZED field only. (Koji Sekiguchi)
 346
 347  * LUCENE-2626: FastVectorHighlighter: enable FragListBuilder and FragmentsBuilder
 348    to be set per-field override. (Koji Sekiguchi)
 349
 350  * LUCENE-2712: FieldBoostMapAttribute in contrib/queryparser was changed from
 351    a Map<CharSequence,Float> to a Map<String,Float>. Per the CharSequence javadoc,
 352    CharSequence is inappropriate as a map key. (Robert Muir)
 353
 354  * LUCENE-1937: Add more methods to manipulate QueryNodeProcessorPipeline elements.
 355    QueryNodeProcessorPipeline now implements the List interface, this is useful
 356    if you want to extend or modify an existing pipeline. (Adriano Crestani via Robert Muir)
 357
 358  * LUCENE-2754, LUCENE-2757: Deprecated SpanRegexQuery. Use
 359    new SpanMultiTermQueryWrapper<RegexQuery>(new RegexQuery()) instead.
 360    (Robert Muir, Uwe Schindler)
 361
 362  * LUCENE-2747: Deprecated ArabicLetterTokenizer. StandardTokenizer now tokenizes
 363    most languages correctly including Arabic.  (Steven Rowe, Robert Muir)
 364
 365  * LUCENE-2830: Use StringBuilder instead of StringBuffer across Benchmark, and
 366    remove the StringBuffer HtmlParser.parse() variant. (Shai Erera)
 367
 368  * LUCENE-2920: Deprecated ShingleMatrixFilter as it is unmaintained and does
 369    not work with custom Attributes or custom payload encoders.  (Uwe Schindler)
 370
 371 New features
 372
 373  * LUCENE-2500: Added DirectIOLinuxDirectory, a Linux-specific
 374    Directory impl that uses the O_DIRECT flag to bypass the buffer
 375    cache.  This is useful to prevent segment merging from evicting
 376    pages from the buffer cache, since fadvise/madvise do not seem.
 377    (Michael McCandless)
 378
 379  * LUCENE-2306: Add NumericRangeFilter and NumericRangeQuery support to XMLQueryParser.
 380    (Jingkei Ly, via Mark Harwood)
 381
 382  * LUCENE-2102: Add a Turkish LowerCase Filter. TurkishLowerCaseFilter handles
 383    Turkish and Azeri unique casing behavior correctly.
 384    (Ahmet Arslan, Robert Muir via Simon Willnauer)
 385
 386  * LUCENE-2039: Add a extensible query parser to contrib/misc.
 387    ExtendableQueryParser enables arbitrary parser extensions based on a
 388    customizable field naming scheme.
 389    (Simon Willnauer)
 390
 391  * LUCENE-2067: Add a Czech light stemmer. CzechAnalyzer will now stem words
 392    when Version is set to 3.1 or higher.  (Robert Muir)
 393
 394  * LUCENE-2062: Add a Bulgarian analyzer.  (Robert Muir, Simon Willnauer)
 395
 396  * LUCENE-2206: Add Snowball's stopword lists for Danish, Dutch, English,
 397    Finnish, French, German, Hungarian, Italian, Norwegian, Russian, Spanish,
 398    and Swedish. These can be loaded with WordListLoader.getSnowballWordSet.
 399    (Robert Muir, Simon Willnauer)
 400
 401  * LUCENE-2243: Add DisjunctionMaxQuery support for FastVectorHighlighter.
 402    (Koji Sekiguchi)
 403
 404  * LUCENE-2218: ShingleFilter supports minimum shingle size, and the separator
 405    character is now configurable. Its also up to 20% faster.
 406    (Steven Rowe via Robert Muir)
 407
 408  * LUCENE-2234: Add a Hindi analyzer.  (Robert Muir)
 409
 410  * LUCENE-2055: Add analyzers/misc/StemmerOverrideFilter. This filter provides
 411    the ability to override any stemmer with a custom dictionary map.
 412    (Robert Muir, Uwe Schindler, Simon Willnauer)
 413
 414  * LUCENE-2399: Add ICUNormalizer2Filter, which normalizes tokens with ICU's
 415    Normalizer2. This allows for efficient combinations of normalization and custom
 416    mappings in addition to standard normalization, and normalization combined
 417    with unicode case folding.  (Robert Muir)
 418
 419  * LUCENE-1343: Add ICUFoldingFilter, a replacement for ASCIIFoldingFilter that
 420    does a more thorough job of normalizing unicode text for search.
 421    (Robert Haschart, Robert Muir)
 422
 423  * LUCENE-2409: Add ICUTransformFilter, which transforms text in a context
 424    sensitive way, either from ICU built-in rules (such as Traditional-Simplified),
 425    or from rules you write yourself.  (Robert Muir)
 426
 427  * LUCENE-2414: Add ICUTokenizer, a tailorable tokenizer that implements Unicode
 428    Text Segmentation. This tokenizer is useful for documents or collections with
 429    multiple languages.  The default configuration includes special support for
 430    Thai, Lao, Myanmar, and Khmer.  (Robert Muir, Uwe Schindler)
 431
 432  * LUCENE-2298: Add analyzers/stempel, an algorithmic stemmer with support for
 433    the Polish language.  (Andrzej Bialecki via Robert Muir)
 434
 435  * LUCENE-2400: ShingleFilter was changed to don't output all-filler shingles and
 436    unigrams, and uses a more performant algorithm to build grams using a linked list
 437    of AttributeSource.cloneAttributes() instances and the new copyTo() method.
 438    (Steven Rowe via Uwe Schindler)
 439
 440  * LUCENE-2437: Add an Analyzer for Indonesian.  (Robert Muir)
 441
 442  * LUCENE-2393: The HighFreqTerms tool (in misc) can now optionally
 443    also include the total termFreq.  (Tom Burton-West via Mike McCandless)
 444
 445  * LUCENE-2463: Add a Greek inflectional stemmer. GreekAnalyzer will now stem words
 446    when Version is set to 3.1 or higher.  (Robert Muir)
 447
 448  * LUCENE-1287: Allow usage of HyphenationCompoundWordTokenFilter without dictionary.
 449    (Thomas Peuss via Robert Muir)
 450
 451  * LUCENE-2464: FastVectorHighlighter: add SingleFragListBuilder to return
 452    entire field contents. (Koji Sekiguchi)
 453
 454  * LUCENE-2503: Added lighter stemming alternatives for European languages.
 455    (Robert Muir)
 456
 457  * LUCENE-2581: FastVectorHighlighter: add Encoder to FragmentsBuilder.
 458    (Koji Sekiguchi)
 459
 460  * LUCENE-2624: Add Analyzers for Armenian, Basque, and Catalan, from snowball.
 461    (Robert Muir)
 462
 463  * LUCENE-1938: PrecedenceQueryParser is now implemented with the flexible QP framework.
 464    This means that you can also add this functionality to your own QP pipeline by using
 465    BooleanModifiersQueryNodeProcessor, for example instead of GroupQueryNodeProcessor.
 466    (Adriano Crestani via Robert Muir)
 467
 468  * LUCENE-2791: Added WindowsDirectory, a Windows-specific Directory impl
 469    that doesn't synchronize on the file handle. This can be useful to
 470    avoid the performance problems of SimpleFSDirectory and NIOFSDirectory.
 471    (Robert Muir, Simon Willnauer, Uwe Schindler, Michael McCandless)
 472
 473  * LUCENE-2842: Add analyzer for Galician. Also adds the RSLP (Orengo) stemmer
 474    for Portuguese.  (Robert Muir)
 475
 476  * SOLR-1057: Add PathHierarchyTokenizer that represents file path hierarchies as synonyms of
 477    /something, /something/something, /something/something/else. (Ryan McKinley, Koji Sekiguchi)
 478
 479 Build
 480
 481  * LUCENE-2124: Moved the JDK-based collation support from contrib/collation
 482    into core, and moved the ICU-based collation support into contrib/icu.
 483    (Steven Rowe, Robert Muir)
 484
 485  * LUCENE-2323: Moved contrib/regex into contrib/queries. Moved the
 486    queryparsers under contrib/misc and contrib/surround into contrib/queryparser.
 487    Moved contrib/fast-vector-highlighter into contrib/highlighter.
 488    Moved ChainedFilter from contrib/misc to contrib/queries. contrib/spatial now
 489    depends on contrib/queries instead of contrib/misc.  (Robert Muir)
 490
 491  * LUCENE-2333: Fix failures during contrib builds, when classes in
 492    core were changed without ant clean. This fix also optimizes the
 493    dependency management between contribs by a new ANT macro.
 494    (Uwe Schindler, Shai Erera)
 495
 496  * LUCENE-2797: Upgrade contrib/icu's ICU jar file to ICU 4.6
 497    (Robert Muir)
 498
 499  * LUCENE-2833: Upgrade contrib/ant's jtidy jar file to r938 (Robert Muir)
 500
 501  * LUCENE-2413: Moved the demo out of lucene core and into contrib/demo.
 502    (Robert Muir)
 503
 504 Optimizations
 505
 506  * LUCENE-2157: DelimitedPayloadTokenFilter no longer copies the buffer
 507    over itsself. Instead it sets only the length. This patch also optimizes
 508    the logic of the filter and uses NIO for IdentityEncoder. (Uwe Schindler)
 509
 510  * LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
 511    directly, instead of Byte/CharBuffers, and modify ICUCollationKeyFilter to
 512    take advantage of this for faster performance.
 513    (Steven Rowe, Uwe Schindler, Robert Muir)
 514
 515  * LUCENE-2194, LUCENE-2201, LUCENE-2288: Snowball stemmers in contrib/analyzers
 516    have been optimized to work on char[] and remove unnecessary object creation.
 517    (Shai Erera, Robert Muir)
 518
 519  * LUCENE-2404: Improve performance of ThaiWordFilter by using a char[]-backed
 520    CharacterIterator (currently from javax.swing).  (Uwe Schindler, Robert Muir)
 521
 522 Test Cases
 523
 524  * LUCENE-2115: Cutover contrib tests to use Java5 generics.  (Kay Kay
 525    via Mike McCandless)
 526
 527 Other
 528
 529  * LUCENE-1845: Updated bdb-je jar from version 3.3.69 to 3.3.93.
 530    (Simon Willnauer via Mike McCandless)
 531
 532  * LUCENE-2415: Use reflection instead of a shim class to access Jakarta
 533    Regex prefix.  (Uwe Schindler)
 534
 535 ================== Release 2.9.4 / 3.0.3 ====================
 536
 537 Bug Fixes
 538
 539  * LUCENE-2277: QueryNodeImpl threw ConcurrentModificationException on
 540    add(List<QueryNode>). (Frank Wesemann via Robert Muir)
 541
 542  * LUCENE-2284: MatchAllDocsQueryNode toString() created an invalid XML tag.
 543    (Frank Wesemann via Robert Muir)
 544
 545  * LUCENE-2278: FastVectorHighlighter: Highlighted term is out of alignment
 546    in multi-valued NOT_ANALYZED field. (Koji Sekiguchi)
 547
 548  * LUCENE-2524: FastVectorHighlighter: use mod for getting colored tag.
 549    (Koji Sekiguchi)
 550
 551  * LUCENE-2616: FastVectorHighlighter: out of alignment when the first value is
 552    empty in multiValued field (Koji Sekiguchi)
 553
 554  * LUCENE-2731, LUCENE-2732: Fix (charset) problems in XML loading in
 555    HyphenationCompoundWordTokenFilter (partial bugfix-only in 2.9 and 3.0,
 556    full fix will be in later 3.1).
 557    (Uwe Schinder)
 558
 559 Documentation
 560
 561  * LUCENE-2055: Add documentation noting that the Dutch and French stemmers
 562    in contrib/analyzers do not implement the Snowball algorithm correctly,
 563    and recommend to use the equivalents in contrib/snowball if possible.
 564    (Robert Muir, Uwe Schindler, Simon Willnauer)
 565
 566  * LUCENE-2653: Add documentation noting that ThaiWordFilter will not work
 567    as expected on all JRE's. For example, on an IBM JRE, it does nothing.
 568    (Robert Muir)
 569
 570 ================== Release 2.9.3 / 3.0.2 ====================
 571
 572 No changes.
 573
 574 ================== Release 2.9.2 / 3.0.1 ====================
 575
 576 New features
 577
 578  * LUCENE-2108: Spellchecker now safely supports concurrent modifications to
 579    the spell-index. Threads can safely obtain term suggestions while the spell-
 580    index is rebuild, cleared or reset. Internal IndexSearcher instances remain
 581    open until the last thread accessing them releases the reference.
 582    (Simon Willnauer)
 583
 584 Bug Fixes
 585
 586  * LUCENE-2144: Fix InstantiatedIndex to handle termDocs(null)
 587    correctly (enumerate all non-deleted docs).  (Karl Wettin via Mike
 588    McCandless)
 589
 590  * LUCENE-2199: ShingleFilter skipped over tri-gram shingles if outputUnigram
 591    was set to false. (Simon Willnauer)
 592
 593  * LUCENE-2211: Fix missing clearAttributes() calls:
 594    ShingleMatrix, PrefixAware, compounds, NGramTokenFilter,
 595    EdgeNGramTokenFilter, Highlighter, and MemoryIndex.
 596    (Uwe Schindler, Robert Muir)
 597
 598  * LUCENE-2207, LUCENE-2219: Fix incorrect offset calculations in end() for
 599    CJKTokenizer, ChineseTokenizer, SmartChinese SentenceTokenizer,
 600    and WikipediaTokenizer.  (Koji Sekiguchi, Robert Muir)
 601
 602  * LUCENE-2266: Fixed offset calculations in NGramTokenFilter and
 603    EdgeNGramTokenFilter.  (Joe Calderon, Robert Muir via Uwe Schindler)
 604
 605 API Changes
 606
 607  * LUCENE-2108: Add SpellChecker.close, to close the underlying
 608    reader.  (Eirik Bjørsnøs via Mike McCandless)
 609
 610  * LUCENE-2165: Add a constructor to SnowballAnalyzer that takes a Set of
 611    stopwords, and deprecate the String[] one.  (Nick Burch via Robert Muir)
 612
 613 ======================= Release 3.0.0 =======================
 614
 615 Changes in backwards compatibility policy
 616
 617  * LUCENE-1257: Change some occurences of StringBuffer in public/protected
 618    APIs of contrib/surround to StringBuilder.
 619    (Paul Elschot via Uwe Schindler)
 620
 621 Changes in runtime behavior
 622
 623  * LUCENE-1966: Modified and cleaned the default Arabic stopwords list used
 624    by ArabicAnalyzer. You'll need to fully re-index any previously created
 625    indexes.  (Basem Narmok via Robert Muir)
 626
 627 API Changes
 628
 629  * LUCENE-1936: Deprecated RussianLowerCaseFilter, because it transforms
 630    text exactly the same as LowerCaseFilter. Please use LowerCaseFilter
 631    instead, which has the same functionality.  (Robert Muir)
 632
 633  * LUCENE-2051: Contrib Analyzer setters were deprecated and replaced
 634    with ctor arguments / Version number.  Also stop word lists
 635    were unified.  (Simon Willnauer)
 636
 637 Bug fixes
 638
 639  * LUCENE-1781: Fixed various issues with the lat/lng bounding box
 640    distance filter created for radius search in contrib/spatial.
 641    (Bill Bell via Mike McCandless)
 642
 643  * LUCENE-1939: IndexOutOfBoundsException at ShingleMatrixFilter's
 644    Iterator#hasNext method on exhausted streams.
 645    (Patrick Jungermann via Karl Wettin)
 646
 647  * LUCENE-1359: French analyzer did not support null field names.
 648    (Andrew Lynch via Robert Muir)
 649
 650 New features
 651
 652  * LUCENE-1924: Added BalancedSegmentMergePolicy to contrib/misc,
 653    which is a merge policy that tries to avoid doing very large
 654    segment merges to give better search performance in a mixed
 655    indexing/searching environment.  (John Wang via Mike McCandless)
 656
 657  * LUCENE-1959: Add index splitting tools. The IndexSplitter tool works
 658    on multi-segment (non optimized) indexes and it can copy specific
 659    segments out of the index into a new index.  It can also list the
 660    segments in the index, and delete specified segments.  (Jason Rutherglen via
 661    Mike McCandless). MultiPassIndexSplitter can split any index into
 662    any number of output parts, at the cost of doing multiple passes over
 663    the input index. (Andrzej Bialecki)
 664
 665  * LUCENE-1993: Add maxDocFreq setting to MoreLikeThis, to exclude
 666    from consideration terms that match more than the specified number
 667    of documents.  (Christian Steinert via Mike McCandless)
 668
 669 Optimizations
 670
 671  * LUCENE-1965, LUCENE-1962: Arabic-, Persian- and SmartChineseAnalyzer
 672    loads default stopwords only once if accessed for the first time.
 673    Previous versions were loading the stopword files each time a new
 674    instance was created. This might improve performance for applications
 675    creating lots of instances of these Analyzers. (Simon Willnauer)
 676
 677 Documentation
 678
 679  * LUCENE-1916: Translated documentation in the smartcn hhmm package.
 680    (Patricia Peng via Robert Muir)
 681
 682 Build
 683
 684  * LUCENE-1904: Moved wordnet-based synonym support from contrib/memory
 685    into contrib/wordnet.  (Robert Muir)
 686
 687  * LUCENE-2031: Moved PatternAnalyzer from contrib/memory into
 688    contrib/analyzers/common, under miscellaneous.  (Robert Muir)
 689
 690 ======================= Release 2.9.1 =======================
 691
 692 Changes in backwards compatibility policy
 693
 694  * LUCENE-2002: Add required Version matchVersion argument when
 695    constructing ComplexPhraseQueryParser and default (as of 2.9)
 696    enablePositionIncrements to true to match StandardAnalyzer's
 697    default.  Also added required matchVersion to most of the analyzers
 698    (Uwe Schindler, Mike McCandless)
 699
 700 Changes in runtime behavior
 701
 702  * LUCENE-1963: ArabicAnalyzer now lowercases before checking the stopword
 703    list. This has no effect on Arabic text, but if you are using a custom
 704    stopword list that contains some non-Arabic words, you'll need to fully
 705    reindex.  (DM Smith via Robert Muir)
 706
 707 Bug fixes
 708
 709  * LUCENE-1953: FastVectorHighlighter: small fragCharSize can cause
 710    StringIndexOutOfBoundsException. (Koji Sekiguchi)
 711
 712  * LUCENE-1929: Highlighter throws exception on NumericRangeQuery and does not
 713    support deprecated RangeQuery.  (Mark Miller)
 714
 715  * LUCENE-2001: Wordnet Syns2Index incorrectly parses synonyms that
 716    contain a single quote. (Parag H. Dave via Robert Muir)
 717
 718  * LUCENE-2003: Highlighter doesn't respect position increments other than 1 with
 719    PhraseQuerys. (Uwe Schindler, Mark Miller)
 720
 721  * LUCENE-1954: InstantiatedIndexWriter: Fixed ClassCastException with
 722    NumericField because of incorrect unchecked cast: Document.getFields()
 723    returns List<Fieldable>.  (Bernd Fondermann via Uwe Schindler)
 724
 725  * LUCENE-2014: SmartChineseAnalyzer did not properly clear attributes
 726    in WordTokenFilter. If enablePositionIncrements is set for StopFilter,
 727    then this could create invalid position increments, causing IndexWriter
 728    to crash.  (Robert Muir, Uwe Schindler)
 729
 730  * LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
 731    (Benjamin Keil via Mark Miller)
 732
 733 ======================= Release 2.9.0 =======================
 734
 735 Changes in runtime behavior
 736
 737  * LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all
 738     number conversion.  You'll need to fully re-index any previously created indexes.
 739     This isn't a break in back-compatibility because local Lucene has not yet
 740     been released.  (Mike McCandless)
 741
 742  * LUCENE-1758: ArabicAnalyzer now uses the light10 algorithm, has a refined
 743     default stopword list, and lowercases non-Arabic text.
 744     You'll need to fully re-index any previously created indexes. This isn't a
 745     break in back-compatibility because ArabicAnalyzer has not yet been
 746     released.  (Robert Muir)
 747
 748
 749 API Changes
 750
 751  * LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards
 752     compatibility with some public classes. If you have implemented custom Fragmenters or Scorers,
 753     you will need to adjust them to work with the new TokenStream API. Rather than getting passed a
 754     Token at a time, you will be given a TokenStream to init your impl with - store the Attributes
 755     you are interested in locally and access them on each call to the method that used to pass a new
 756     Token. Look at the included updated impls for examples.  (Mark Miller)
 757
 758  * LUCENE-1460: Change contrib TokenStreams/Filters to use the new
 759     TokenStream API. (Robert Muir, Michael Busch)
 760
 761  * LUCENE-1775, LUCENE-1903: Change remaining TokenFilters (shingle, prefix-suffix)
 762     to use the new TokenStream API. ShingleFilter is much more efficient now,
 763     it clones much less often and computes the tokens mostly on the fly now.
 764     Also added more tests. (Robert Muir, Michael Busch, Uwe Schindler, Chris Harris)
 765
 766  * LUCENE-1685: The position aware SpanScorer has become the default scorer
 767     for Highlighting. The SpanScorer implementation has replaced QueryScorer
 768     and the old term highlighting QueryScorer has been renamed to
 769     QueryTermScorer. Multi-term queries are also now expanded by default. If
 770     you were previously rewriting the query for multi-term query highlighting,
 771     you should no longer do that (unless you switch to using QueryTermScorer).
 772     The SpanScorer API (now QueryScorer) has also been improved to more closely
 773     match the API of the previous QueryScorer implementation.  (Mark Miller)
 774
 775  * LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
 776     Analyzers. If you need to index text in these encodings, please use Java's
 777     character set conversion facilities (InputStreamReader, etc) during I/O,
 778     so that Lucene can analyze this text as Unicode instead.  (Robert Muir)
 779
 780 Bug fixes
 781
 782  * LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index.
 783     (Karl Wettin)
 784
 785  * LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the
 786     same way IndexWriter does. Parts of InstantiatedIndex was not Serializable.
 787     (Karl Wettin)
 788
 789  * LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index.
 790     (Karl Wettin, Robert Newson)
 791
 792  * LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException
 793     due to recursive invocation. (Karl Wettin)
 794
 795  * LUCENE-1548: Fix distance normalization in LevenshteinDistance to
 796     not produce negative distances (Thomas Morton via Mike McCandless)
 797
 798  * LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS
 799     characters to only apply to the correct subset (Daniel Cheng via
 800     Mike McCandless)
 801
 802  * LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after
 803     StandardTokenizer so that stop words with mixed case are filtered
 804     out.  (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless)
 805
 806  * LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size.
 807     (Todd Teak via Otis Gospodnetic)
 808
 809  * LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by
 810     RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so
 811     that the regexp must match the entire string, not just a prefix.
 812     (Trejkaz via Mike McCandless)
 813
 814  * LUCENE-1792: Fix new query parser to set rewrite method for
 815     multi-term queries. (Luis Alves, Mike McCandless via Michael Busch)
 816
 817  * LUCENE-1828: Fix memory index to call TokenStream.reset() and
 818     TokenStream.end(). (Tim Smith via Michael Busch)
 819
 820  * LUCENE-1912: Fix fast-vector-highlighter issue when two or more
 821    terms are concatenated (Koji Sekiguchi via Mike McCandless)
 822
 823 New features
 824
 825  * LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin)
 826
 827  * LUCENE-1435: Added contrib/collation, a CollationKeyFilter
 828     allowing you to convert tokens into CollationKeys encoded using
 829     IndexableBinaryStringTools.  This allows for faster RangeQuery when
 830     a field needs to use a custom Collator.  (Steven Rowe via Mike
 831     McCandless)
 832
 833  * LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now
 834     read/write bz2 using Apache commons compress library.  This means
 835     you can download the .bz2 export from http://wikipedia.org and
 836     immediately index it.  (Shai Erera via Mike McCandless)
 837
 838  * LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers.  It
 839     improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese
 840     sentences properly.  SmartChineseAnalyzer uses a Hidden Markov
 841     Model to tokenize Chinese words in a more intelligent way.
 842     (Xiaoping Gao via Mike McCandless)
 843
 844  * LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll)
 845
 846  * LUCENE-1578: Support for loading unoptimized readers to the
 847     constructor of InstantiatedIndex. (Karl Wettin)
 848
 849  * LUCENE-1704: Allow specifying the Tidy configuration file when
 850     parsing HTML docs with contrib/ant.  (Keith Sprochi via Mike
 851     McCandless)
 852
 853  * LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative
 854     highlighter.  (Koji Sekiguchi via Mike McCandless)
 855
 856  * LUCENE-1740: Added "analyzer" command to Lucli, enabling changing
 857     the analyzer from the default StandardAnalyzer.  (Bernd Fondermann
 858     via Mike McCandless)
 859
 860  * LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan
 861     Leibiusky via Mike McCandless)
 862
 863  * LUCENE-1745: Added constructors to JakartaRegexpCapabilities and
 864     JavaUtilRegexCapabilities as well as static flags to support
 865     configuring a RegexCapabilities implementation with the
 866     implementation-specific modifier flags. Allows for callers to
 867     customize the RegexQuery using the implementation-specific options
 868     and fine tune how regular expressions are compiled and
 869     matched. (Marc Zampetti zampettim@aim.com via Mike McCandless)
 870
 871  * LUCENE-1567: Added a new QueryParser framework, that allows
 872     implementing a new query syntax in a flexible and efficient way.
 873     This new QueryParser will be moved to Lucene's core in release
 874     3.0 and will then replace the current core QueryParser, which
 875     has been deprecated with this patch.
 876     (Luis Alves and Adriano Campos via Michael Busch)
 877
 878  * LUCENE-1486: Added ComplexPhraseQueryParser, an extension of QueryParser
 879     that allows a subset of the Lucene query language to be embedded in
 880     PhraseQuerys. Wildcard, Range, and Fuzzy queries, as well as limited
 881     boolean logic, can be used within quote operators with this parser, ie:
 882     "(jo* -john) smyth~". (Mark Harwood via Mark Miller)
 883
 884  * Added web-based demo of functionality in contrib's XML Query Parser
 885     packaged as War file (Mark Harwood)
 886
 887  * LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)
 888
 889  * LUCENE-1628: Added Persian analyzer.  (Robert Muir)
 890
 891  * LUCENE-1813: Add option to ReverseStringFilter to mark reversed tokens.
 892     (Andrzej Bialecki via Robert Muir)
 893
 894 Optimizations
 895
 896  * LUCENE-1643: Re-use the collation key (RawCollationKey) for
 897      better performance, in ICUCollationKeyFilter.  (Robert Muir via
 898      Mike McCandless)
 899
 900  * LUCENE-1794: Implement TokenStream reuse for contrib Analyzers,
 901      and implement reset() for TokenStreams to support reuse.  (Robert Muir)
 902
 903 Documentation
 904
 905  * LUCENE-1876: added missing package level documentation for numerous
 906      contrib packages.
 907      (Steven Rowe & Robert Muir)
 908
 909 Build
 910
 911  * LUCENE-1728: Split contrib/analyzers into common and smartcn modules.
 912    Contrib/analyzers now builds an additional lucene-smartcn Jar file. All
 913    smartcn classes are not included in the lucene-analyzers JAR file.
 914    (Robert Muir via Simon Willnauer)
 915
 916  * LUCENE-1829: Fix contrib query parser to properly create javacc files.
 917    (Jan-Pascal and Luis Alves via Michael Busch)
 918
 919 Test Cases
 920
 921
 922 ======================= Release 2.4.0 =======================
 923
 924 Changes in runtime behavior
 925
 926  (None)
 927
 928 API Changes
 929
 930  1.
 931
 932  (None)
 933
 934 Bug fixes
 935
 936  1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames()
 937     and tests that assert that deleted documents behaves as they should (they did).
 938     (Jason Rutherglen, Karl Wettin)
 939
 940  2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat
 941     the array offset right. (Jason Rutherglen via Karl Wettin)
 942
 943 New features
 944
 945  1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin)
 946
 947  2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500.
 948     Introducing Hungarian, Turkish and Romanian support, updated older stemmers
 949     and optimized (reflectionless) SnowballFilter.
 950     IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older)
 951     might not be compatible with these updated classes as some algorithms have changed.
 952     (Karl Wettin)
 953
 954  3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors
 955     or by resolving the inverted index. (Karl Wettin)
 956
 957 Documentation
 958
 959  (None)
 960
 961 Build
 962
 963  (None)
 964
 965 Test Cases
 966
 967  (None)