2 # This file defines a Japanese stoptag set for JapanesePartOfSpeechStopFilter.
4 # Any token with a part-of-speech tag that exactly matches those defined in this
5 # file are removed from the token stream.
7 # Set your own stoptags by uncommenting the lines below. Note that comments are
8 # not allowed on the same line as a stoptag. See LUCENE-3745 for frequency lists,
9 # etc. that can be useful for building you own stoptag set.
11 # The entire possible tagset is provided below for convenience.
14 # noun: unclassified nouns
17 # noun-common: Common nouns or nouns where the sub-classification is undefined
20 # noun-proper: Proper nouns where the sub-classification is undefined
23 # noun-proper-misc: miscellaneous proper nouns
26 # noun-proper-person: Personal names where the sub-classification is undefined
29 # noun-proper-person-misc: names that cannot be divided into surname and
30 # given name; foreign names; names where the surname or given name is unknown.
34 # noun-proper-person-surname: Mainly Japanese surnames.
38 # noun-proper-person-given_name: Mainly Japanese given names.
42 # noun-proper-organization: Names representing organizations.
46 # noun-proper-place: Place names where the sub-classification is undefined
49 # noun-proper-place-misc: Place names excluding countries.
53 # noun-proper-place-country: Country names.
57 # noun-pronoun: Pronouns where the sub-classification is undefined
60 # noun-pronoun-misc: miscellaneous pronouns:
61 # e.g. それ, ここ, あいつ, あなた, あちこち, いくつ, どこか, なに, みなさん, みんな, わたくし, われわれ
64 # noun-pronoun-contraction: Spoken language contraction made by combining a
65 # pronoun and the particle 'wa'.
66 # e.g. ありゃ, こりゃ, こりゃあ, そりゃ, そりゃあ
69 # noun-adverbial: Temporal nouns such as names of days or months that behave
70 # like adverbs. Nouns that represent amount or ratios and can be used adverbially,
74 # noun-verbal: Nouns that take arguments with case and can appear followed by
75 # 'suru' and related verbs (する, できる, なさる, くださる)
76 # e.g. インプット, 愛着, 悪化, 悪戦苦闘, 一安心, 下取り
79 # noun-adjective-base: The base form of adjectives, words that appear before な ("na")
83 # noun-numeric: Arabic numbers, Chinese numerals, and counters like 何 (回), 数.
84 # e.g. 0, 1, 2, 何, 数, 幾
87 # noun-affix: noun affixes where the sub-classification is undefined
90 # noun-affix-misc: Of adnominalizers, the case-marker の ("no"), and words that
91 # attach to the base form of inflectional words, words that cannot be classified
92 # into any of the other categories below. This category includes indefinite nouns.
93 # e.g. あかつき, 暁, かい, 甲斐, 気, きらい, 嫌い, くせ, 癖, こと, 事, ごと, 毎, しだい, 次第,
94 # 順, せい, 所為, ついで, 序で, つもり, 積もり, 点, どころ, の, はず, 筈, はずみ, 弾み,
95 # 拍子, ふう, ふり, 振り, ほう, 方, 旨, もの, 物, 者, ゆえ, 故, ゆえん, 所以, わけ, 訳,
96 # わり, 割り, 割, ん-口語/, もん-口語/
99 # noun-affix-adverbial: noun affixes that that can behave as adverbs.
100 # e.g. あいだ, 間, あげく, 挙げ句, あと, 後, 余り, 以外, 以降, 以後, 以上, 以前, 一方, うえ,
101 # 上, うち, 内, おり, 折り, かぎり, 限り, きり, っきり, 結果, ころ, 頃, さい, 際, 最中, さなか,
102 # 最中, じたい, 自体, たび, 度, ため, 為, つど, 都度, とおり, 通り, とき, 時, ところ, 所,
103 # とたん, 途端, なか, 中, のち, 後, ばあい, 場合, 日, ぶん, 分, ほか, 他, まえ, 前, まま,
107 # noun-affix-aux: noun affixes treated as 助動詞 ("auxiliary verb") in school grammars
108 # with the stem よう(だ) ("you(da)").
109 # e.g. よう, やう, 様 (よう)
112 # noun-affix-adjective-base: noun affixes that can connect to the indeclinable
113 # connection form な (aux "da").
117 # noun-special: special nouns where the sub-classification is undefined.
120 # noun-special-aux: The そうだ ("souda") stem form that is used for reporting news, is
121 # treated as 助動詞 ("auxiliary verb") in school grammars, and attach to the base
122 # form of inflectional words.
126 # noun-suffix: noun suffixes where the sub-classification is undefined.
129 # noun-suffix-misc: Of the nouns or stem forms of other parts of speech that connect
130 # to ガル or タイ and can combine into compound nouns, words that cannot be classified into
131 # any of the other categories below. In general, this category is more inclusive than
132 # 接尾語 ("suffix") and is usually the last element in a compound noun.
133 # e.g. おき, かた, 方, 甲斐 (がい), がかり, ぎみ, 気味, ぐるみ, (~した) さ, 次第, 済 (ず) み,
134 # よう, (でき)っこ, 感, 観, 性, 学, 類, 面, 用
137 # noun-suffix-person: Suffixes that form nouns and attach to person names more often
142 # noun-suffix-place: Suffixes that form nouns and attach to place names more often
147 # noun-suffix-verbal: Of the suffixes that attach to nouns and form nouns, those that
148 # can appear before スル ("suru").
149 # e.g. 化, 視, 分け, 入り, 落ち, 買い
152 # noun-suffix-aux: The stem form of そうだ (様態) that is used to indicate conditions,
153 # is treated as 助動詞 ("auxiliary verb") in school grammars, and attach to the
154 # conjunctive form of inflectional words.
158 # noun-suffix-adjective-base: Suffixes that attach to other nouns or the conjunctive
159 # form of inflectional words and appear before the copula だ ("da").
163 # noun-suffix-adverbial: Suffixes that attach to other nouns and can behave as adverbs.
164 # e.g. 後 (ご), 以後, 以降, 以前, 前後, 中, 末, 上, 時 (じ)
167 # noun-suffix-classifier: Suffixes that attach to numbers and form nouns. This category
168 # is more inclusive than 助数詞 ("classifier") and includes common nouns that attach
170 # e.g. 個, つ, 本, 冊, パーセント, cm, kg, カ月, か国, 区画, 時間, 時半
173 # noun-suffix-special: Special suffixes that mainly attach to inflecting words.
174 # e.g. (楽し) さ, (考え) 方
177 # noun-suffix-conjunctive: Nouns that behave like conjunctions and join two words
179 # e.g. (日本) 対 (アメリカ), 対 (アメリカ), (3) 対 (5), (女優) 兼 (主婦)
182 # noun-verbal_aux: Nouns that attach to the conjunctive particle て ("te") and are
183 # semantically verb-like.
184 # e.g. ごらん, ご覧, 御覧, 頂戴
187 # noun-quotation: text that cannot be segmented into words, proverbs, Chinese poetry,
188 # dialects, English, etc. Currently, the only entry for 名詞 引用文字列 ("noun quotation")
192 # noun-nai_adjective: Words that appear before the auxiliary verb ない ("nai") and
193 # behave like an adjective.
194 # e.g. 申し訳, 仕方, とんでも, 違い
198 # prefix: unclassified prefixes
201 # prefix-nominal: Prefixes that attach to nouns (including adjective stem forms)
202 # excluding numerical expressions.
203 # e.g. お (水), 某 (氏), 同 (社), 故 (~氏), 高 (品質), お (見事), ご (立派)
206 # prefix-verbal: Prefixes that attach to the imperative form of a verb or a verb
207 # in conjunctive form followed by なる/なさる/くださる.
208 # e.g. お (読みなさい), お (座り)
211 # prefix-adjectival: Prefixes that attach to adjectives.
212 # e.g. お (寒いですねえ), バカ (でかい)
215 # prefix-numerical: Prefixes that attach to numerical expressions.
220 # verb: unclassified verbs
233 # adjective: unclassified adjectives
239 # adjective-auxiliary:
246 # adverb: unclassified adverbs
249 # adverb-misc: Words that can be segmented into one unit and where adnominal
250 # modification is not possible.
254 # adverb-particle_conjunction: Adverbs that can be followed by の, は, に,
256 # e.g. こんなに, そんなに, あんなに, なにか, なんでも
260 # adnominal: Words that only have noun-modifying forms.
261 # e.g. この, その, あの, どの, いわゆる, なんらかの, 何らかの, いろんな, こういう, そういう, ああいう,
262 # どういう, こんな, そんな, あんな, どんな, 大きな, 小さな, おかしな, ほんの, たいした,
263 # 「(, も) さる (ことながら)」, 微々たる, 堂々たる, 単なる, いかなる, 我が」「同じ, 亡き
267 # conjunction: Conjunctions that can occur independently.
268 # e.g. が, けれども, そして, じゃあ, それどころか
272 # particle: unclassified particles.
275 # particle-case: case particles where the subclassification is undefined.
278 # particle-case-misc: Case particles.
279 # e.g. から, が, で, と, に, へ, より, を, の, にて
282 # particle-case-quote: the "to" that appears after nouns, a person’s speech,
283 # quotation marks, expressions of decisions from a meeting, reasons, judgements,
285 # e.g. ( だ) と (述べた.), ( である) と (して執行猶予...)
288 # particle-case-compound: Compounds of particles and verbs that mainly behave
289 # like case particles.
290 # e.g. という, といった, とかいう, として, とともに, と共に, でもって, にあたって, に当たって, に当って,
291 # にあたり, に当たり, に当り, に当たる, にあたる, において, に於いて,に於て, における, に於ける,
292 # にかけ, にかけて, にかんし, に関し, にかんして, に関して, にかんする, に関する, に際し,
293 # に際して, にしたがい, に従い, に従う, にしたがって, に従って, にたいし, に対し, にたいして,
294 # に対して, にたいする, に対する, について, につき, につけ, につけて, につれ, につれて, にとって,
295 # にとり, にまつわる, によって, に依って, に因って, により, に依り, に因り, による, に依る, に因る,
296 # にわたって, にわたる, をもって, を以って, を通じ, を通じて, を通して, をめぐって, をめぐり, をめぐる,
297 # って-口語/, ちゅう-関西弁「という」/, (何) ていう (人)-口語/, っていう-口語/, といふ, とかいふ
300 # particle-conjunctive:
301 # e.g. から, からには, が, けれど, けれども, けど, し, つつ, て, で, と, ところが, どころか, とも, ども,
302 # ながら, なり, ので, のに, ば, ものの, や ( した), やいなや, (ころん) じゃ(いけない)-口語/,
303 # (行っ) ちゃ(いけない)-口語/, (言っ) たって (しかたがない)-口語/, (それがなく)ったって (平気)-口語/
306 # particle-dependency:
307 # e.g. こそ, さえ, しか, すら, は, も, ぞ
310 # particle-adverbial:
311 # e.g. がてら, かも, くらい, 位, ぐらい, しも, (学校) じゃ(これが流行っている)-口語/,
312 # (それ)じゃあ (よくない)-口語/, ずつ, (私) なぞ, など, (私) なり (に), (先生) なんか (大嫌い)-口語/,
313 # (私) なんぞ, (先生) なんて (大嫌い)-口語/, のみ, だけ, (私) だって-口語/, だに,
314 # (彼)ったら-口語/, (お茶) でも (いかが), 等 (とう), (今後) とも, ばかり, ばっか-口語/, ばっかり-口語/,
315 # ほど, 程, まで, 迄, (誰) も (が)([助詞-格助詞] および [助詞-係助詞] の前に位置する「も」)
318 # particle-interjective: particles with interjective grammatical roles.
322 # particle-coordinate:
323 # e.g. と, たり, だの, だり, とか, なり, や, やら
327 # e.g. かい, かしら, さ, ぜ, (だ)っけ-口語/, (とまってる) で-方言/, な, ナ, なあ-口語/, ぞ, ね, ネ,
328 # ねぇ-口語/, ねえ-口語/, ねん-方言/, の, のう-口語/, や, よ, ヨ, よぉ-口語/, わ, わい-口語/
331 # particle-adverbial/conjunctive/final: The particle "ka" when unknown whether it is
332 # adverbial, conjunctive, or sentence final. For example:
333 # (a) 「A か B か」. Ex:「(国内で運用する) か,(海外で運用する) か (.)」
334 # (b) Inside an adverb phrase. Ex:「(幸いという) か (, 死者はいなかった.)」
335 # 「(祈りが届いたせい) か (, 試験に合格した.)」
336 # (c) 「かのように」. Ex:「(何もなかった) か (のように振る舞った.)」
340 # particle-adnominalizer: The "no" that attaches to nouns and modifies
341 # non-inflectional words.
344 # particle-adnominalizer: The "ni" and "to" that appear following nouns and adverbs
345 # that are giongo, giseigo, or gitaigo.
349 # particle-special: A particle that does not fit into one of the above classifications.
350 # This includes particles that are used in Tanka, Haiku, and other poetry.
351 # e.g. かな, けむ, ( しただろう) に, (あんた) にゃ(わからん), (俺) ん (家)
359 # interjection: Greetings and other exclamations.
360 # e.g. おはよう, おはようございます, こんにちは, こんばんは, ありがとう, どうもありがとう, ありがとうございます,
361 # いただきます, ごちそうさま, さよなら, さようなら, はい, いいえ, ごめん, ごめんなさい
365 # symbol: unclassified Symbols.
368 # symbol-misc: A general symbol not in one of the categories below.
372 # symbol-comma: Commas
376 # symbol-period: Periods and full stops.
380 # symbol-space: Full-width whitespace.
383 # symbol-open_bracket:
387 # symbol-close_bracket:
395 # other: unclassified other
398 # other-interjection: Words that are hard to classify as noun-suffixes or
399 # sentence-final particles.
404 # filler: Aizuchi that occurs during a conversation or sounds inserted as filler.
409 # non-verbal: non-verbal sound.
417 # unknown: unknown part of speech.