X-Git-Url: https://git.mdrn.pl/pylucene.git/blobdiff_plain/a2e61f0c04805cfcb8706176758d1283c7e3a55c..aaeed5504b982cf3545252ab528713250aa33eed:/lucene-java-3.5.0/lucene/JRE_VERSION_MIGRATION.txt diff --git a/lucene-java-3.5.0/lucene/JRE_VERSION_MIGRATION.txt b/lucene-java-3.5.0/lucene/JRE_VERSION_MIGRATION.txt new file mode 100644 index 0000000..5889849 --- /dev/null +++ b/lucene-java-3.5.0/lucene/JRE_VERSION_MIGRATION.txt @@ -0,0 +1,36 @@ +If possible, use the same JRE major version at both index and search time. +When upgrading to a different JRE major version, consider re-indexing. + +Different JRE major versions may implement different versions of Unicode, +which will change the way some parts of Lucene treat your text. + +For example: with Java 1.4, LetterTokenizer will split around the character U+02C6, +but with Java 5 it will not. +This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4. + +For reference, JRE major versions with their corresponding Unicode versions: +Java 1.4, Unicode 3.0 +Java 5, Unicode 4.0 +Java 6, Unicode 4.0 +Java 7, Unicode 6.0 + +In general, whether or not you need to re-index largely depends upon the data that +you are searching, and what was changed in any given Unicode version. For example, +if you are completely sure that your content is limited to the "Basic Latin" range +of Unicode, you can safely ignore this. + +Special Notes: + +LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION + +* StandardAnalyzer will return the same results under Java 5 as it did under +Java 1.4. This is because it is largely independent of the runtime JRE for +Unicode support, (with the exception of lowercasing). However, no changes to +casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are +using this Analyzer you are NOT affected. + +* SimpleAnalyzer, StopAnalyzer, LetterTokenizer, LowerCaseFilter, and +LowerCaseTokenizer may return different results, along with many other Analyzers +and TokenStreams in Lucene's contrib area. If you are using one of these +components, you may be affected. +