Jack Halpern


   Data Licensing

Linguistic and Technical Documents

This page brings together some linguistic and technical documents written by Jack Halpern, aimed at introducing the CJK languages, in addition to Arabic, with emphasis on the linguistic issues to be addressed in developing both CJK and Arabic linguistic tools.

Japanese Information Processing

The Japanese Language

Chinese Information Processing

Korean Information Processing

Arabic Information Processing

Other languages

  • Is English Segmentation Trivial?
    Describes the principal word-formation processes in English, and demonstrates that word segmentation in English, contrary to popular belief, is far from trivial.

  • Criteria for Inclusion of Multiword Lexical Units in Dictionaries
    Coming Soon.

  • European and Semitic languages
    Coming Soon. A series of reports describing the features of the major European and Semitic languages, focusing on orthographic variation, and describing the linguistic issues to be addressed in developing linguistic tools.