lucene-java-3.4.0/lucene/contrib/analyzers/common/src/test/org/apache/lucene/analysis/de/data.txt

   1 # German special characters are replaced:
   2 häufig haufig
   3
   4 # here the stemmer works okay, it maps related words to the same stem:
   5 abschließen    abschliess
   6 abschließender abschliess
   7 abschließendes abschliess
   8 abschließenden abschliess
   9
  10 Tisch   tisch
  11 Tische  tisch
  12 Tischen tisch
  13
  14 Haus    hau
  15 Hauses  hau
  16 Häuser hau
  17 Häusern        hau
  18 # here's a case where overstemming occurs, i.e. a word is
  19 # mapped to the same stem as unrelated words:
  20 hauen   hau
  21
  22 # here's a case where understemming occurs, i.e. two related words
  23 # are not mapped to the same stem. This is the case with basically
  24 # all irregular forms:
  25 Drama   drama
  26 Dramen  dram
  27
  28 # replace "ß" with 'ss':
  29 Ausmaß ausmass
  30
  31 # fake words to test if suffixes are cut off:
  32 xxxxxe  xxxxx
  33 xxxxxs  xxxxx
  34 xxxxxn  xxxxx
  35 xxxxxt  xxxxx
  36 xxxxxem xxxxx
  37 xxxxxer xxxxx
  38 xxxxxnd xxxxx
  39 # the suffixes are also removed when combined:
  40 xxxxxetende     xxxxx
  41
  42 # words that are shorter than four charcters are not changed:
  43 xxe     xxe
  44 # -em and -er are not removed from words shorter than five characters:
  45 xxem    xxem
  46 xxer    xxer
  47 # -nd is not removed from words shorter than six characters:
  48 xxxnd   xxxnd