Introduction

A Language Documenter's Guide to Annotating Text

Shobhana Chelliah and Samson Lotven

The following text and related links are designed to help students and practitioners of language documentation and description collect audio and video of naturally produced interactional speech and create intellectual access to those recordings. The text is written so it can be used as a companion to demonstration workshops or as part of a technology-focused class on language documentation. The main focus of the text is on annotation of connected text and archiving of source files and annotation files. We use the Computational Resource for South Asian Languages (CoRSAL) metadata schema to teach about metadata and file naming. We use the Summer Institute of Linguistics Fieldworks Language Explorer (FLEx) program to teach about annotation.

With respect to audio and video recording, scanning, and digitizing, we recognize the need for training on these activities. Since technology advances so rapidly, we provide some general guidelines relevant to the populations most likely to use this text: students and practitioners working in the Tibeto-Burman context. We also direct students to relevant, freely available materials that are frequently updated.

The text can be used for 8 or 16 weeks of instruction. If accelerated to day-long workshops, the text could be used for a 4-day or 8-day workshop. It would be important for students to have materials of their own to process during the course of instruction.

Week 1: Collecting and Adding Words: Install FLEx and review basic features. Practice adding words through “collect words”. Discuss semantic domains and rapid word collection. Add words through “lexicon edit”. Discuss fields and dictionary formats. Illustrate bulk add of words from existing databases into FLEx. Illustrate how to add video, audio, and images. Discuss file naming and data management for these materials.

Week 2: Transcribing Connected Text: Discuss the uses of connected text for language revitalization, pedagogy, and description. Demonstrate install, set up and transcription using SayMore. Discuss IPA versus practical orthographies. Install and practice KeyMan for font and character entry.

Week 3: Annotating Connected Text: Insert a new text into the FLEx baseline. Discuss what constitute a clauses and determines a line. Discuss punctuation versus diacritics. Discuss standardization of orthography for analysis, specifically word breaks and representing allomorphy on the baseline.

Week 4: Prepping for Annotation: Identify possible categories and morphology as found in related languages. Review commonly used labels and abbreviations for related languages.

Week 5: Creating First-pass Glossing: Enter lexeme glosses and free translations using the FLEx Gloss tab. Discuss translation quagmires.

Week 6: Discussing Hierarchical Glossing: Enter morpheme glosses using FLEx Analyze tab. Gloss what you know using two-part glossing [functional/semantic label: specific instance]. Discuss what is ready for archiving, metadata needed, and file naming. Discuss iterative improvements to glossing.

Week 7: Writing a Guide to your IGT: Apply principles from the Leipzig Glossing Rules.

Week 8: Using your corpus: Discuss uses of the corpus for dictionary creation and grammatical description. Practice use of the concordance feature in FLEx. Illustrate use of the corpus for grammatical discovery. Practice moving examples to a text document. Discuss how to cite a corpus or examples from the corpus.

We acknowledge funding from the National Science Foundation Conference on Standards for Interlinear Glossed Texts in Related Languages (2020-2022 #2015980) which supported work on this project, including discussion between linguists on annotation standards and workshops where we taught the strategies for annotation covered here.

License

Icon for the Creative Commons Attribution 4.0 International License

Introduction by Shobhana Chelliah and Samson Lotven is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Share This Book