SM-VN-2014-02

Project Description

The purpose of this Seed Funding request is to establish a working partnership and develop a joint project proposal between the Distributed Information Systems Laboratory (LSIR) at EPFL and the Vietnam Institute of Lexicography and Encyclopedia (VIOLE: http://bachkhoatoanthu.vass.gov.vn/Pages/trangchu.aspx) in Hanoi, for the creation of advanced lexical systems and materials for Human Language Technology (HLT) for Vietnamese. The SF project will involve:

1) A planning meeting among the collaborators in Hanoi at which the lexicographical data model of Vietnamese will be codified, site features localized, the data elicitation process elaborated, and instructional training videos produced.

2) A pilot phase for community building and data collection, with the intent to produce a trustworthy human- and machine-useable lexical data source of 10,000 terms and their definitions in Vietnamese.

3) A finalization meeting in Hanoi to review the pilot data, write an academic paper discussing the results, and produce a fundable proposal for a full Vietnamese component within Kamusi.

The project will involve the integration of the work of the Kamusi Project at LSIR on a broad lexicographical data system, with the focus on Vietnamese data and language technology of the Vietnamese partners.

Dr. Martin Benjamin
Dr. Hien Pam