Related Databases
Though Arabic has become a world language of critical importance, lexical resources, especially for proper nouns, are either scarce or exist only on a small scale. Because of the important role personal names play in such natural language applications as named entity extraction and machine translation, we are continuously expanding and revising our proper noun resources, which provide systematic coverage of Arabic orthographic variants and common orthographic errors.
The CJKI Dictionary Institute, in an international collaboration effort including Arabic name specialists, has developed new techniques for the collection, validation and attestation of non-Arab names written in Arabic, and are now in the process of building a comprehensive Database of Foreign Names in Arabic, referred to as DAFNA.
The sample below shows orthographic variants and spelling errors of a common American given name (John), and a ommon American surname (Davis). The original American name data was obtained from the U.S. Census Bureau. A larger sample is also available.
| ENGLISH | ARABIC | WEB FREQ (English+Arabic) | WEB FREQ (Arabic only) |
|---|---|---|---|
| John | جوون | 0036500 | 0044500 |
| John | جون | 0032700 | 0947000 |
| John | جان | 0031300 | 2160000 |
| John | جوهان | 0000224 | 0007090 |
| John | جوهن | 0000173 | 0001180 |
| John | دجون | 0000029 | 0001680 |
| John | جهون | 0000009 | 0000328 |
| Davis | ديفيس | 0000613 | 0012300 |
| Davis | دافيس | 0000249 | 0001680 |
| Davis | ديفز | 0000228 | 0002300 |
| Davis | ديفس | 0000157 | 0002020 |
| Davis | دايفس | 0000040 | 0000652 |
| Davis | دفيس | 0000034 | 0000490 |
| Davis | دفيز | 0000005 | 0000098 |