Dictionaries
Dictionaries
Resources
Consulting
Order
on line
Japanese
Chinese
Korean
Arabic
Websites
Articles/papers
What is CJKI?
What is KDPS?
Jack Halpern
Links
|
Principal Arabic Lexical Resources
كُلُّ عَام وَأَنْتُم بخَيْر
Assalamu `Alaykum -- welcome to our page for Arabic lexical resources.
Arabic, one of the six official languages of the United Nations, is spoken by 246 million speakers worldwide -- not only in North Africa and the Middle East, but also in many other countries since it is the language of the Koran.
Though Arabic is playing an increasingly important role in the world today, few lexical and linguistic resources are available for it. The CJK Dictionary Institute has been engaged in the development of comprehensive Arabic lexical databases, with a special focus on proper nouns. These resources, described below, are designed for MT and various NLP applications such as named entity recognition (NER) and anti-money laundering (AML) programs.
- NEWS FLASH
Database of Arab Names (DAN).
A comprehensive database covering approximately one and a half million Arab names variants, including
OFAC names. Of great interest to security agencies, enabling identification of names
spelled differently (thousands of ways to spell some names). See the full
announcement for details.
- NEWS FLASH: Announcing the ARAN and NANA systems for automatically transcribing CJK and Latin names to/from Arabic. Also see The Challenges and Pitfalls of Arabic Romanization and Arabization (.pdf file, 293K), an academic paper describing these systems.
-
Dictionary of Arabic Place Name Variants (DAP). A database of
Arabic-English place names including systematic coverage for orthographic
variants and common orthographic errors.
-
Dictionary of Arabic Proper Nouns. A database of
Arabic-English proper nouns covering surnames, given names, and
place names in both vocalized and unvocalized Arabic with romanized
transcriptions.
-
The Typology of Arabic Proper Nouns.
A 50-page report with in-depth analysis of the etymology, structure, and typology
of Arabic proper nouns, highly useful for Arabic information processing and name
recognition, with several appendixes.
-
Arabic Broken Plurals. A comprehensive database of broken plurals
(unpredictable) in Arabic given in three versions -- voweled, unvoweled,
and transcription -- with cross-references from plural to singular.
Essential for morphological analysis and NLP applications.
-
Arabic Transcription and Transliteration. An overview of some linguistic
issues related to transliteration and transcription, with
special focus on our Arabic transcription technology.
-
Arabic Lexical Database. We are now developing a comprehensive Arabic monolingual
lexical database, which contains detailed grammatical and phonological
attributes such as POS codes, conjugation patterns and verb transitivity,
suitable for such applications as NLP, MT systems and morphological
analysis.
-
Arabic Companies and Organizations. A database of Arabic company
and organization names is now under development.
|