Jack Halpern


   Data Licensing

Principal Arabic Lexical Resources

كُلُّ عَام وَأَنْتُم بخَيْر

Arabic, one of the six official languages of the United Nations, is spoken by 246 million speakers worldwide -- not only in North Africa and the Middle East, but also in many other countries since it is the language of the Koran.

Though Arabic is playing an increasingly important role in the world today, few lexical and linguistic resources are available for it. The CJK Dictionary Institute has been engaged in the development of comprehensive Arabic lexical databases, with a special focus on proper nouns. These resources, described below, are designed for machine translation and various natural language processing applications such as named entity recognition and anti-money laundering programs.


  • Database of Arab Names (DAN).
    A comprehensive database covering over 6.5 million Arab names and variants, based on authoritative resources and extensively proofread by a team of Arabic native speaker editors.

  • Arab Name Transcription Engine Demo (ANTE).
    Powered by our DAN and DANA products, this demonstration site showcases the coverage and breadth these two databases offer.

  • The CJKI Arabic Verb Conjugator (CAVE).
    An interactive Arabic-English verb conjugation application for iOS devices that provides conjugation paradigms for over 1,600 Arabic verbs.

  • The CJKI Arabic Learner’s Dictionary (CALD) (.pdf).
    A new concept dictionary that enables learners to gain a full understanding of MSA core vocabulary. An Arabic summary is available at القاموس العربي الإنجليزي للمتعلمين (.pdf)

  • Database of Arab Names in Arabic (DANA).
    A one-of-a-kind resource of Arab personal names and variants, in the original Arabic script. This database covers several hundred thousand Arabic script variants, along with common spelling mistakes.

  • Comprehensive Word Lists for Arabic (CJKAWORD).
    Comprehensive monolingual word lists for Arabic covering general vocabulary, proper nouns and technical terms. Includes both a lexical database for canonical forms and a full-furm lexicon.

  • Database of Arabic Business Names (DABNA).
    Arabic Companies and Organizations. A database of Arabic company and organization names is now under development.

  • Expanded OFAC (XOFAC).
    To address the shortcomings of OFAC's SDN List, CJKI has developed a comprehensive "Expanded OFAC" database of OFAC full name variants, the vast majority of which are not listed in OFAC.

  • Database of Foreign Names in Arabic (DAFNA).
    A database of non-Arab names transcribed to Arabic, including Arabic orthographic variants and common orthographic errors.

  • Dictionary of Arabic Place Names (DAPNA).
    A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.

  • The ARAN and NANA systems automatically transcribe CJK and Latin names to and from Arabic.

  • Arabic Broken Plurals.
    A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription.

  • Arabic Transcription and Transliteration.
    An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology.

  • Arabic Lexical Database (ALD).
    We are now developing a comprehensive Arabic monolingual lexical database, which contains detailed grammatical and phonological attributes such as POS codes, conjugation patterns and verb transitivity, suitable for such applications as NLP, MT systems and morphological analysis.