©2004-2008 The CJK Dictionary Institute, Inc.
|
Our comprehensive Japanese lexical resources (including bilingual and trilingual dictionaries) currently contain approximately three million entries, covering general vocabulary, technical terminology, proper nouns, company and organization names, and katakana loanwords. This document describes our Japanese Lexical Database (JLD), a comprehensive database with a rich set of grammatical attributes fine-tuned for NLP applications such as machine translation (MT), information retrieval (IR) as well as morphological analysis and tokenization. It contains about 300,000 entries covering general vocabulary, both free forms and bound forms. The data is available in any encoding (UTF8, EUC, Shift-JIS) and any format (plain text, Excel, html, etc.). JLD includes a significant number of affixes, particles, auxiliaries and conjugation patterns to account for all the inflectional, derivational and lexical morphology in Japanese so as to enable recognition of inflected and derived forms. To make JLD robust for IR, it is highly recommended to supplement it with our Japanese Orthographical Database (JOD), described in details in The Challenges of Intelligent Japanese Searching. |
| 1 | LEXEME | Japanese word in standard kana-kanji orthography | ||||||
| 2 | HIRAGANA | Reading in hiragana, including two types of okurigana, full okurigana and inflectional okurigana. | ||||||
| 3 | POS | Part of speech code. See jappos.htm for POS code definitions. | ||||||
| 4 | SUBPOS | Sub-part-of-speech code. See jappos.htm for SUBPOS code definitions. | ||||||
| 5 | CONJUG | Conjugation pattern. See jappos.htm for CONJUG code definitions. More details are available on request. | ||||||
| 6 | TYPE | A subclassification that identifies semantic properties of the headword or supplementary information such grammatical attributes. See cpostype.htm for TYPE code definitions. | ||||||
| 7 | MORPH | A subclassification that identifies additional morphological properties of the headword. See jappos.htm for MORPH code definitions. | ||||||
| 8 | VALENCY | Binding valency that indicates the degree of binding between a stem/lexeme and an affix. See jappos.htm for code definition and japaffix.htm for a detailed description of various morphological attributes. | ||||||
| 9 | RANKING | Zero-padded six-digit number indicating a ranking based on
frequency statistics. | ||||||
| 10 | SCRIPT | The type of script used to write the headword:
| ||||||
| 11 | BEFORE | An adjacency attribute that indicates the part of speech (POS)
of the lexeme, stem or base preceding a suffix or suffix-like element. For
example, "NX" for the compounding suffix 員 means that 員 can be preceded
by a common noun or verbal noun, as in 研究員. Only given for suffixes. | ||||||
| 12 | AFTER | An adjacency attribute that indicates the part of speech (POS)
of the lexeme following a prefix or prefix-like element. For example, "NC"
for the adnomial prefix 元 means that 元 can be followed by a common noun,
as in 元総理大臣. Given only for prefixes. | ||||||
| 13 | COMPPOS | The part of speech (POS) of the lexeme resulting from affixing
a prefix or suffix. For example, "NC" for the adnomial prefix 元 means that
prefixing 元 (to a common noun) results in a common noun (元総理大臣). Only given
for affixes. | ||||||
| 14 | HEPBURN2 | Reading in modified Hepburn romanization (macrons replaced by vowel repetition). |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| がぶ飲み | がぶのみ | VN | t | 0 | 033273 | J | gabunomi | ||||||
| がましげ | がましげ | FS | M | 1 | 061089 | J | VC | AN | gamashige | ||||
| がましさ | がましさ | WS | 1 | 061089 | J | VC | NC | gamashisa | |||||
| がま口 | がまぐち | NC | 0 | 041445 | J | gamaguchi | |||||||
| がらがら | がらがら | D | 0 | 033273 | J | garagara | |||||||
| がらがら | がらがら | VN | i | 0 | 033273 | J | garagara | ||||||
| がらがら蛇 | がらがらへび | NC | 0 | 061089 | J | garagarahebi | |||||||
| がらくた | がらくた | NC | 0 | 017822 | J | garakuta | |||||||
| がらっと | がらっと | D | 0 | 041445 | J | garatto | |||||||
| がらっぱち | がらっぱち | AN | 0 | 0 | 061089 | J | garappachi | ||||||
| がらっぱち | がらっぱち | NC | 0 | 061089 | J | garappachi | |||||||
| がらみ | がらみ | WS | 1 | 061089 | J | NC | NC | garami | |||||
| がわり | がわり | WS | 1 | 061089 | J | NC | VN | gawari | |||||
| がんがん | がんがん | D | 0 | 033273 | J | gangan | |||||||
| がんがん | がんがん | VN | i | 0 | 033273 | J | gangan | ||||||
| がんじがらめ | がんじがらめ | NC | 0 | 013474 | J | ganjigarame | |||||||
| がんとして | がんとして | D | 0 | 028538 | J | gantoshite | |||||||
| がん遺伝子 | がんいでんし | NC | 0 | 013474 | J | gan'idenshi | |||||||
| がん化 | がんか | VN | 0 | 028538 | J | ganka | |||||||
| がんセンター | がんせんたー | NC | 0 | 025149 | J | gansenta_ | |||||||
| 慣れ | なれ | NC | 0 | 017822 | J | nare | |||||||
| 慣れきる | な.れき-る | V5 | R | 0 | 022662 | J | narekiru | ||||||
| 慣れっこ | なれっこ | AN | 1 | 0 | 020741 | J | narekko | ||||||
| 慣れっこ | なれっこ | NC | 0 | 020741 | J | narekko | |||||||
| 慣れる | な.れ-る | V1 | i | 0 | 002465 | J | nareru | ||||||
| 慣れる | なれる | WS | 1 | 002465 | J | VC | V1 | nareru | |||||
| 慣れ切る | なれき-る | V5 | R | 0 | 033273 | J | narekiru | ||||||
| 慣わし | ならわし | NC | 0 | 033273 | J | narawashi | |||||||
| 慣わす | なら.わ-す | V5 | S | t | 0 | 061089 | J | narawasu | |||||
| 慣わす | ならわす | WS | 1 | 061089 | J | VC | V5 | narawasu | |||||
| 慣行 | かんこう | NC | 0 | 007161 | J | kanko_ | |||||||
| 慣行犯 | かんこうはん | NC | 0 | 061089 | J | kanko_han | |||||||
| 慣手段 | かんしゅだん | NC | 0 | 061089 | J | kanshudan | |||||||
| 慣習 | かんしゅう | NC | 0 | 007457 | J | kanshu_ | |||||||
| 慣習法 | かんしゅうほう | NC | 0 | 061089 | J | kanshu_ho_ | |||||||
| 慣熟 | かんじゅく | VN | i | 0 | 061089 | J | kanjuku | ||||||
| 慣性 | かんせい | NC | 0 | 013474 | J | kansei | |||||||
| 慣性の法則 | かんせいのほうそく | U | U | 061089 | J | kanseinoho_soku | |||||||
| 生 | いき | NC | 0 | 061089 | J | iki | |||||||
| 生 | う | WS | 1 | 061089 | J | NC | NC | u | |||||
| 生 | うまれ | NC | 0 | 061089 | J | umare | |||||||
| 生 | うまれ | WS | 1 | 061089 | J | NC NP | NC | umare | |||||
| 生 | うみ | NC | 0 | 061089 | J | umi | |||||||
| 生 | き | NC | 0 | 061089 | J | ki | |||||||
| 生 | き | WP | 1 | 061089 | J | NC | NC | ki | |||||
| 生 | しょう | NC | 0 | 061089 | J | sho_ | |||||||
| 生 | せい | NR | 0 | 003721 | J | sei | |||||||
| 生 | せい | WS | 1 | 003721 | J | NC | NC | sei | |||||
| 生 | なま | NC | 0 | 010656 | J | nama | |||||||
| 生 | なま | WP | 1 | 010656 | J | NC | NC | nama | |||||
| 生 | なまり | NC | 0 | 061089 | J | namari |