Syntactic Annotation of Hebrew CHILDES Corpora

Avishay Gretz, M.Sc. Thesis Seminar
Wednesday, 5.12.2012, 12:30
Taub 601
Prof. A. Itai and Prof. S. Wintner

The CHILDES database is a large collection of child---adult spoken interactions in over 25 languages. Automatic annotation of these data faciliates research on child language development and acquisition by providing researchers with a large amount of accurate data. Recently, the English section of the CHILDES database was automatically annotated with labeled dependency relations in a state-of-the-art approach. We describe a similar endeavor, focusing on the Hebrew section of CHILDES. This is done by the following process: First, we design a novel annotation scheme of dependency relations reflecting constructions of child and child-directed utterances, as well as the special phenomena of the Hebrew language. We then annotate a corpus with these dependency relations, and use the manually-annotated data to train a parser with which the rest of the corpora can be annotated. We then evaluate the parsing accuracy. We show the adaptabtility of our annotation scheme to the CHILDES corpora in numerous evaluation scenarios. We also examine different annotation approaches of linguistic issues relevant to several languages or unique to Hebrew, as well the contribution of morphological features to the accuracy of dependency parsing of the Hebrew section of CHILDES. This is the first syntactic parser of Hebrew spoken language.

Back to the index of events