Chinese-English Database of Place Names


©2008-2017 The CJK Dictionary Institute, Inc.



Basic Specifications
Overview

We maintain the world's largest Chinese-English bilingual bidirectional databases of proper nouns, of which the following data sample is a subset focusing on Chinese and non-Chinese place names. Our proper noun databases are used by some of the world's major IT companies for a wide variety of applications, such as:

  • machine translation (MT)
  • information retrieval (IR)
  • morphological analysis (MA)
  • electronic dictionaries (ED)
  • input method editors (IME)
  • named entity recognition (NER)

See also our Comprehensive Database of Chinese Name Variants, which contains hundreds of thousands of name variants.

Coverage and Fields

Our Chinese - English - Chinese database of Chinese and non-Chinese place names place names is very comprehensive.

Each entry includes various attributes such as readings in pinyin, zhuyin fuhao, Cantonese and several romanization systems, semantic classification codes and frequency rankings, locale codes, and other useful information such as frequency statistics, described in chinfreq.htm (but not shown here). Please contact Jack Halpern () for details and larger samples.

Format
and
Encoding

The data in any desired format, such as plain text files with fields delimited by tabs, Excel, Access, html, etc. in any encoding, such as UTF-8, UCS-2, EUC, GB-2312 and Big Five (the sample below is encoded in UTF-8).



Field Description
Field
Number
Field
Name
Description
1 ID

The prefix "N" stands for "Name"

2 ENGLISH

Common English name for Place

3 SC

Name in Simplified Chinese

4 L/O

An example of orthographic conversion, marked "O" in the table below, is 头发 converted to 頭髮, and 出发 to 出發. An example of lexemic conversion, marked "L" in the table below, is 'laser', translated to 激光 in SC but to 雷射 in TC. For details, see Jack Halpern's paper The Pitfalls and Complexities of Chinese to Chinese Conversion, presented at several international conferences.

5 TC

Name in Traditional Chinese

6 SC_PIN

Pinyin plus tone with hyphen separating syllables

7 ZHUYIN

Name rendered in Zhuyin fuhao transcription system



Database of Chinese and non-Chinese Place Names
ID ENGLISH SC L/O TC SC_PIN ZHUYIN
N002657 Aruba 阿鲁巴 L 阿盧巴 a1-lu3-ba1 ㄚㄌㄨˊㄅㄚ
N001635 Azerbaijan 阿塞拜疆 L 亞塞拜然 a1-sai1-bai4-jiang1 ㄧㄚˋㄙㄜˋㄅㄞˋㄖㄢˊ
N081006 Brasilia 巴西利亚 O 巴西利亞 ba1-xi1-li4-ya4 ㄅㄚㄒㄧㄌㄧˋㄧㄚˋ
N016658 Caracas 加拉加斯 L 卡拉卡斯 jia1-la1-jia1-si1 ㄎㄚˇㄌㄚㄎㄚˇㄙ
N014214 Cairo 开罗 O 開羅 kai1-luo2 ㄎㄞㄌㄨㄛˊ
N058842 Chad 乍得 L 查德 zha4-de2 ㄔㄚˊㄉㄜˊ
N087916 Dongyang City 东阳市 O 東陽市 dong1-yang2-shi4 ㄉㄨㄥㄧㄤˊㄕˋ
N078960 Fukuoka 福冈 O 福岡 fu2-gang1 ㄈㄨˊㄍㄤ
N047517 Georgia 乔治亚 O 喬治亞 qiao2-zhi4-ya4 ㄑㄧㄠˊㄓˋㄧㄚˋ
N023778 Guinea 几内亚 O 幾內亞 ji3-nei4-ya4 ㄐㄧˇㄋㄟˋㄧㄚˋ
N031561 Haiyan 海盐 O 海鹽 hai3-yan2 ㄏㄞˇㄧㄢˊ
N036150 Hanyang 汉阳 O 漢陽 han4-yang2 ㄏㄢˋㄧㄤˊ
N032756 Heshan 鹤山 O 鶴山 he4-shan1 ㄏㄜˋㄕㄢ
N032307 Huailai 怀来 O 懷來 huai2-lai2 ㄏㄨㄞˊㄌㄞˊ
N000617 Ireland 爱尔兰 O 愛爾蘭 ai4-er3-lan2 ㄞˋㄌㄧㄣˊ
N052916 Jiangning County 江宁县 O 江寧縣 jiang1-ning2-xian4 ㄐㄧㄤㄋㄧㄥˊㄒㄧㄢˋ
N125824 Longjing 龙井 O 龍井 long2-jing3 ㄌㄨㄥˊㄐㄧㄥˇ
N068134 New Zealand 新西兰 L 紐西蘭 xin1-xi1-lan2 ㄋㄧㄡˇㄒㄧㄌㄢˊ
N057306 Sanya City 三亚市 O 三亞市 san1-ya4-shi4 ㄙㄢㄧㄚˋㄕˋ
N36301 Seoul 首尔 O 首爾 shou3-er3 ㄕㄡˇㄦˇ
N054474 Seoul 汉城 O 漢城 han4-cheng2 ㄏㄢˋㄔㄥˊ
N077920 Taierzhuang District 台儿庄区 O 台兒莊區 tai2-er2-zhuang1-qu1 ㄊㄞˊㄦˊㄓㄨㄤㄑㄩ
N062125 Tel Aviv 特拉维夫 O 特拉維夫 te4-la1-wei2-fu1 ㄊㄜˋㄌㄚㄨㄟˊㄈㄨ
N061921 Xiaoshan City 萧山市 O 蕭山市 xiao1-shan1-shi4 ㄒㄧㄠㄕㄢㄕˋ
N004005 Yemen 也门 L 葉門 ye3-men2 ㄧㄝˋㄇㄣˊ
N042437 Yibin 宜宾 O 宜賓 yi2-bin1 ㄧˊㄅㄧㄣ