
The CJKI Chinese lexical database currently contains over four million Simplified Chinese (SC) and Traditional Chinese (TC) headwords covering general vocabulary, important technical terms, and proper nouns. Each lexeme is accompanied by a pinyin reading or readings, and various other attributes (see chinword.htm for details).
What is especially noteworthy is that the pinyin readings take into account the differences in pronunciation between Taiwan and the People's Republic of China, as shown in the table below. Even highly educated native Chinese speakers are often surprised to discover that such differences exist.
Our pinyin readings have been thoroughly proofread for accuracy, and explicitly indicate the neutral tone, which is often ignored by conventional dictionaries. This data, which can be provided in all the major transcription systems such as Yale, Wade-Giles, Zhuyin and IPA, is especially useful for speech technology applications, such as TTS (text-to-speech) software.
| ID | DIFF | SC_HANZI | SC_FREQ | SC_PIN | TC_HANZI | TC_FREQ | TC_PIN |
|---|---|---|---|---|---|---|---|
| I00001 | D | 临期 | 0000029000 | línqī | 臨期 | 0000028800 | línqí |
| I00002 | D | 企 | 0030900000 | qǐ | 企 | 0030900000 | qì |
| I00003 | D | 企业 | 0163000000 | qǐyè | 企業 | 0102000000 | qìyè |
| I00004 | D | 倬雄 | 0000000167 | zhuōxióng | 倬雄 | 0000000167 | zhuóxióng |
| I00005 | D | 危 | 0006720000 | wēi | 危 | 0006720000 | wéi |
| I00006 | D | 危险 | 0022400000 | wēixiǎn | 危險 | 0003080000 | wéixiǎn |
| I00007 | D | 发 | 0235000000 | fà | 髮 | 0006950000 | fǎ |
| I00008 | D | 埒城 | 0000000411 | lièchéng | 埒城 | 0000000411 | lèchéng |
| I00009 | D | 夕日 | 0002020000 | xīrì | 夕日 | 0002020000 | xìrì |
| I00010 | D | 大期 | 0000061500 | dàqī | 大期 | 0000061500 | dàqí |
| I00011 | D | 巍八郎 | 0000000044 | wēibāláng | 巍八郎 | 0000000044 | wéibāláng |
| I00012 | D | 帆柱 | 0000030600 | fānzhù | 帆柱 | 0000030600 | fánzhù |
| I00013 | D | 微 | 0035500000 | wēi | 微 | 0035500000 | wéi |
| I00014 | D | 微笑 | 0018400000 | wēixiào | 微笑 | 0018400000 | wéixiào |
| I00015 | D | 拙夫 | 0000017200 | zhuōfū | 拙夫 | 0000017200 | zhuófū |
| I00016 | S | 无着 | 0000265000 | wúzhuó | 無著 | 0000265000 | wúzhuó |
| I00017 | D | 昔日 | 0004880000 | xīrì | 昔日 | 0004880000 | xírì |
| I00018 | D | 显微镜 | 0003390000 | xiǎnwēijìng | 顯微鏡 | 0000228000 | xiǎnwéijìng |
| I00019 | D | 期待 | 0059100000 | qīdài | 期待 | 0059100000 | qídài |
| I00020 | D | 池穴 | 0000059400 | chíxué | 池穴 | 0000059400 | chíxuè |
| I00021 | D | 理发 | 0002170000 | lǐfà | 理髮 | 0000495000 | lǐfǎ |
| I00022 | D | 隆巴妮 | 0000000137 | lóngbānī | 隆巴妮 | 0000000137 | lóngbāní |
| I00023 | D | 麦卡锡 | 0000058100 | màikǎxī | 麥卡錫 | 0000010400 | màikǎxí |
| ID | DIFF | SC_HANZI | SC_FREQ | SC_PIN | TC_HANZI | TC_FREQ | TC_PIN |
|---|---|---|---|---|---|---|---|
| G00018 | S | 咖啡豆 | 0000779000 | kāfēidòu | 咖啡豆 | 0000779000 | kāfēidòu |
| G00019 | S | 咖啡豆 研磨机 | 0000001780 | kāfēidòuyánmójī | 咖啡豆 研磨機 | 0000001770 | kāfēidòuyánmójī |
| G04348 | S | 咖啡豆象 | 0000000405 | kāfēidòuxiàng | 咖啡豆象 | 0000000405 | kāfēidòuxiàng |
| G04349 | S | 咖啡豆酊 | 0000000028 | kāfēidòudīng | 咖啡豆酊 | 0000000028 | kāfēidòudīng |
| G04340 | S | 咖啡酸 | 0000026200 | kāfēisuān | 咖啡酸 | 0000026200 | kāfēisuān |
| G04342 | S | 咖啡醇 | 0000001910 | kāfēichún | 咖啡醇 | 0000001910 | kāfēichún |
| G04335 | S | 咖啡 锈病 | 0000000208 | kāfēixiùbìng | 咖啡 鏽病 | 0000000023 | kāfēixiùbìng |
| G00022 | S | 咖啡 面包卷 | 0000000110 | kāfēimiànbāojuǎn | 咖啡 面包卷 | 0000000110 | kāfēimiànbāojuǎn |
| G00008 | S | 咖啡馆 | 0002260000 | kāfēiguǎn | 咖啡館 | 0002320000 | kāfēiguǎn |
| G00024 | D | 咖喱 | 0001800000 | gālí | 咖喱 | 0001800000 | kālǐ |
| G04358 | D | 咖喱 牛肉 | 0000072700 | gālíniúròu | 咖喱 牛肉 | 0000072700 | kālǐniúròu |
| G00027 | D | 咖喱粉 | 0000087400 | gālífěn | 咖喱粉 | 0000087400 | kālǐfěn |
| G04356 | D | 咖喱酱 | 0000030000 | gālíjiàng | 咖喱醬 | 0000029900 | kālǐjiàng |
| G04357 | D | 咖喱饭 | 0000122000 | gālífàn | 咖喱飯 | 0000122000 | kālǐfàn |
| G00026 | D | 咖喱鸡 | 0000146000 | gālíjī | 咖喱雞 | 0000146000 | kālǐjī |
| G04360 | S | 咖陶导数 | 0000000000 | kātáodǎoshù | 咖陶導數 | 0000000000 | kātáodǎoshù |
| G04359 | S | 咖马拉 | 0000000002 | kāmǎlā | 咖馬拉 | 0000000002 | kāmǎlā |