The CJK Dictionary Institute
CJKI

Dictionaries
   Dictionaries
   Resources
   Consulting
   Order on line
   Japanese
   Chinese
   Korean
   Arabic


Websites
   Articles/papers
   What is CJKI?
   What is KDPS?
   Jack Halpern
   Links

 

Principal Arabic Lexical Resources

كُلُّ عَام وَأَنْتُم بخَيْر

Assalaamu `Alaykum -- welcome to our page for Arabic lexical resources.

Arabic, one of the six official languages of the United Nations, is spoken by 246 million speakers worldwide -- not only in North Africa and the Middle East, but also in many other countries since it is the language of the Koran.

Though Arabic is playing an increasingly important role in the world today, few lexical and linguistic resources are available for it. The CJK Dictionary Institute has been engaged in the development of comprehensive Arabic lexical databases, with a special focus on proper nouns. These resources, described below, are designed for machine translation (MT) and various natural language processing (NLP) applications such as named entity recognition (NER) and anti-money laundering (AML) programs.

  1. Database of Arab Names (DAN). A comprehensive database covering approximately 2.4 million Arab names and variants, including OFAC names, based on authoritative resources and extensively proofread by a team of Arabic native speaker editors.

  2. NEW The CJKI Arabic Learner’s Dictionary (CALD). A new concept dictionary enables learners gain a fully understanding MSA core vocabulary. An Arabic summary is available at القاموس العربي الإنجليزي للمتعلمين .

  3. UPDATED Database of Arab Names in Arabic (DANA). A one-of-a-kind resource of Arab personal names and variants, in the original Arabic script. This database covers several hundred thousand Arabic script variants, along with common spelling mistakes.

  4. NEW Database of Arabic Business Names (DABNA).Arabic Companies and Organizations. A database of Arabic company and organization names is now under development.

  5. EXPANDED Expanded OFAC (XOFAC). To address the shortcomings of OFAC's SDN List, CJKI has developed a comprehensive "Expanded OFAC" database of OFAC full name variants, the vast majority of which are not listed in OFAC.

  6. NEW Database of Foreign Names in Arabic (DAFNA). A database of non-Arab names transcribed to Arabic, including Arabic orthographic variants and common orthographic errors.

  7. Dictionary of Arabic Place Name Variants (DAPNA). A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.

  8. The ARAN and NANA systems automatically transcribeCJK and Latin names to/from Arabic.

  9. Dictionary of Arabic Proper Nouns. A database of Arabic-English proper nouns covering surnames, given names, and place names in both vocalized and unvocalized Arabic with romanized transcriptions.

  10. Arabic Broken Plurals (.doc file, 95K). A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription -- with cross-references from plural to singular. Essential for morphological analysis and NLP applications.

  11. Arabic Transcription and Transliteration. An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology.

  12. Arabic Lexical Database(ALD). We are now developing a comprehensive Arabic monolingual lexical database, which contains detailed grammatical and phonological attributes such as POS codes, conjugation patterns and verb transitivity, suitable for such applications as NLP, MT systems and morphological analysis.

CJKI Home