Jack Halpern


   Data Licensing

About CJKI


The CJK Dictionary Institute, Inc. (CJKI) specializes in CJK lexicography. CJKI is headed by Jack Halpern, editor-in-chief of the New Japanese-English Character Dictionary and various other CJK dictionaries, which has become a standard reference work in Japanese language learning.

CJKI has become one of the world's prime resources for CJK lexical resources, and is contributing to CJK information processing technology by providing high-quality lexical resources and consulting services to some of the world's leading software developers and IT companies.


The principal activity of CJKI is the development and continuous expansion of lexical databases of general vocabulary, proper nouns and technical terms for CJK languages (Chinese, Japanese, Korean), including Chinese dialects such as Cantonese and Hakka, containing millions of entries. We have also developed databases and romanization systems of Arabic proper nouns, a comprehensive Spanish-English dictionary, a Chinese-Vietnamese names dictionary, and various others. In addition, we offer a full range of professional consulting services on CJK linguistics and lexicography.


Advanced computational lexicography methods are used to compile and maintain databases that are serving as a source of data for:

  • Natural language processing (NLP) applications such as information retrieval (IR) tools, search engine technology and morphological analyzers.

  • CJK input method editors (IME) and front-end processors (FEP).

  • Machine translation (MT)and online translation tools such as Babylon.

  • Speech technology data, including IPA, tones, Japanese accents and minute allophonic changes.

  • The world's largest databases for Simplified to/from Chinese conversion.

  • Lexicographic works and dictionaries, including electronic dictionaries.

  • Pedagogical, linguistic and computational lexicography research.


Our comprehensive CJK lexical databases currently contain about 24 million entries, including detailed grammatical, phonological and semantic attributes for general vocabulary, proper nouns, and technical terms. Our database of proper nouns, which has about 1.4 million Japanese and some three million Simplified and Traditional Chinese entries, is without peer both in terms of quantity and quality. Our single-character database covers every aspect of CJK characters, including frequency, phonology, radicals, character codes, and much more, ideally suited for mobile platform IMEs.