The CJK Dictionary Institute
CJKI

Dictionaries
   Dictionaries
   Resources
   Consulting
   Order on line
   Japanese
   Chinese
   Korean
   Arabic


Websites
   Articles/papers
   What is CJKI?
   What is KDPS?
   Jack Halpern
   Links

 

Principal Arabic Lexical Resources

كُلُّ عَام وَأَنْتُم بخَيْر


Assalamu `Alaykum -- welcome to our page for Arabic lexical resources.

Arabic, one of the six official languages of the United Nations, is spoken by 246 million speakers worldwide -- not only in North Africa and the Middle East, but also in many other countries since it is the language of the Koran.

Though Arabic is playing an increasingly important role in the world today, few lexical and linguistic resources are available for it. The CJK Dictionary Institute has been engaged in the development of comprehensive Arabic lexical databases, with a special focus on proper nouns. These resources, described below, are designed for MT and various NLP applications such as named entity recognition (NER) and anti-money laundering (AML) programs.

  1. NEWS FLASH Database of Arab Names (DAN). A comprehensive database covering approximately one and a half million Arab names variants, including OFAC names. Of great interest to security agencies, enabling identification of names spelled differently (thousands of ways to spell some names). See the full announcement for details.


  2. NEWS FLASH: Announcing the ARAN and NANA systems for automatically transcribing CJK and Latin names to/from Arabic. Also see The Challenges and Pitfalls of Arabic Romanization and Arabization (.pdf file, 293K), an academic paper describing these systems.


  3. Dictionary of Arabic Place Name Variants (DAP). A database of Arabic-English place names including systematic coverage for orthographic variants and common orthographic errors.


  4. Dictionary of Arabic Proper Nouns. A database of Arabic-English proper nouns covering surnames, given names, and place names in both vocalized and unvocalized Arabic with romanized transcriptions.


  5. The Typology of Arabic Proper Nouns. A 50-page report with in-depth analysis of the etymology, structure, and typology of Arabic proper nouns, highly useful for Arabic information processing and name recognition, with several appendixes.


  6. Arabic Broken Plurals. A comprehensive database of broken plurals (unpredictable) in Arabic given in three versions -- voweled, unvoweled, and transcription -- with cross-references from plural to singular. Essential for morphological analysis and NLP applications.


  7. Arabic Transcription and Transliteration. An overview of some linguistic issues related to transliteration and transcription, with special focus on our Arabic transcription technology.


  8. Arabic Lexical Database. We are now developing a comprehensive Arabic monolingual lexical database, which contains detailed grammatical and phonological attributes such as POS codes, conjugation patterns and verb transitivity, suitable for such applications as NLP, MT systems and morphological analysis.

  9. Arabic Companies and Organizations. A database of Arabic company and organization names is now under development.


 

 

CJKI Home