1 Creating a Language Collection

Learning Objectives

After successful completion of this module, you will have reviewed:

  • Motivations for creating a collection
  • What to collect and why
  • How to locate resources for creating source material

1.1 Introduction

Let us first think about your source data: the video, audio or text material you have collected or are setting out to collect. We will begin with this introspective activity.  How would you identify yourself? For example, are you a community language documenter, working independently or with a community to revitalize, preserve, teach, share the language and culture of the community? Are you a student or teacher of linguistics working in classroom setting, learning about documentation and description and discovering the structure of a language through interviews of speakers? Are you an anthropologist, ethnographer, or folklorist? Next, consider your short term goals and your long terms goals in creating a collection. A short term goal may be to highlight the work done by community members to share traditional stories with a larger audience. A long term goal could be to use those stories in pedagogical materials.

 

Discussion: Where do short and long term goals intersect?

Based on your introspection and class discussion, use the small group format to discuss how similar or different your background, motivation, and goals are. Are there ways that each group can support the other’s goals?

 

To help with this activity, you might refer to articles or chapters listed at the end of this section. In Why Language Documentation Matters, I (Chelliah) provide case studies of language documentation conducted by academic and non-academic speakers, archivists, and documentary linguists. An easy activity for the class would be to split into groups and discuss and read further about each case study to discover the motivations in those inspirational documentation and archiving examples. The three other readings provide standard definitions of Language Documentation (Woodbury 2011), review planning and implementation of documentation projects (Chelliah 2018), and describe the uses and formats of materials produced from documentation projects (Good 2011).

Further Reading

Chelliah, S. (2021). Why Language Documentation Matters.  Springer Briefs in Linguistics. Dordrecht:  Springer Academic Press.

Chelliah, S. (2018). The design and implementation of documentation projects for spoken languages. In K. Regh and L. Campbell (Eds.). Oxford University handbook on endangered languages, (pp.147-167). Oxford, England: Oxford University Press. DOI: 10.1093/oxfordhb/9780190610029.013.9

Good, J. (2011). Data and Language Documentation. In P. Austin and J. Sallabank (Eds.). The Cambridge Handbook of Endangered Languages, (pp. 212-234). Cambridge, England: Cambridge University Press. Available from https://www.researchgate.net/publication/282733657_Data_and_language_documentation

Woodbury, A. (2011). Language Documentation. In P. Austin and J. Sallabank (Eds.). The Cambridge Handbook of Endangered Languages, (pp.159-186). Cambridge, England: Cambridge University Press.

_________________________________________________

At this point, we suggest that you download and install the following free software. We will be returning to use this software in the next chapters. Please contact the instructor or teaching assistants if you have any issue with the download and installation. If you are using a Mac. you will need software such as a Parallels Desktop to run these Windows applications.  Additional software tailored for use in this course will be available at the CoRSAL website through our expanding CoRSAL software suite. Please check the CoRSAL website for those links.

_____________________________________________________

1.2 What to Collect?

What you decide to collect, record, and curate for your collection depends on your ultimate goals. For example, if your goals are language description, then you may aim to collect language used in many contexts by many speakers. If you are interested in language revitalization, you may focus on traditional food items, plants, birds, religious events, or other culture-specific domains, the knowledge of which is being lost.

Who is the intended audience for this collection? Your intended audience may be an audience of one, that is, you create a collection to support your descriptive goals. Or, if your goals are revitalization, your intended audience might be members of your speech community.

As illustrated through case studies in Chelliah (2021), when members of a speaker community play a core role in setting out what needs to be included in a documentary corpus, we have a much better chance of creating a lasting comprehensive record of language. Combined or coordinated efforts between collectors leads to rich, diversified language documentation with multiple uses.

Recordings of spoken, naturalistic interactions are a major part of what language documenters collect. Let us discuss as a class, the kinds of naturalistic interactions you might want to document. Here are some typical items that documenters collect: photographs, rare text documents about or in the language being documented, songs (often times with dance or gestures), traditional stories and personal anecdotes, conversations, public speeches, sermons, political speeches, responses to non-linguistic interview questions, responses to prompts related to language inquiry, and wordlists.

Based on your personal access to these types of interactions you want to collect, what will your challenges and opportunities be in achieving your goals?

 

Discussion: Can language documentation be ethically implemented?

Do language documentation projects implemented by linguists employ methods that are extractive (getting data from speech community) or exploitative (using the data for personal academic gains)? Discuss your answers to this question and posit solutions. This article by Adrienne Tsikewa can help with your discussion: https://muse.jhu.edu/article/840964

Your documentation project will include different types of speech samples. Some of them will be speech used in common interactions (e.g., buying something at the market) or special linguistic practices (a blessing ritual). Documentation projects also include tools to understand the interactions you have collected; for example, wordlists to aid with glossing, word analyses to help with writing a grammar of the language, and sound and word analyses to help with orthography development. In other words, your documentation project will have audio, video, and textual records of language interactions you want to document, but also will likely include results of analyses of that source material.

Here we provide some pointers on where to begin with your language documentation. This is to answer the question we have often heard from groups in India: “Where do we start?” Answer:  Let’s start with words!

 

Wordlists

Wordlist data is a good place to start when learning how to elicit, record, and archive speech samples. Wordlists are also valuable in that they can be the start of a lexicon or dictionary that can be used in translating longer speech samples like clauses in traditional narratives.

Activity: Find existing materials on your language

Create an annotated bibliography of the existing materials on your language. Is there a dictionary or grammar? Are there wordlists or grammatical sketches in edited volumes or surveys? Have community members created wordlists that are circulating on social media? What types of audio and video have already been created to support understanding of the collected words? Where are these audio and video files stored and shared? Knowing what is already there will help you decide what to add or build on.

In preparing for wordlist elicitation, choose lists that meet your goals. For example, if you want to do typological research, comparing between many unrelated languages, you will want to include words commonly found in the world’s languages, such as on the Swadesh List, or the Leipzig-Jakarta List given below. If your goals include historical work comparing related languages, a list tailored to the specific language family or subgroup is more appropriate. Here is the Leipzig-Jakarta List which will be useful for many purposes, including a first-pass at establishing the sounds of your language.

 

Leipzig-Jakarta List
1. fire 34. who? 68. skin/hide
2. nose 35. 3rd person pronouns 69. to suck
3. to go 36. to hit/beat 70. to carry
4. water 37. leg/foot 71. ant
5. mouth 38. horn 72. heavy
6. tongue 39. this 73. to take
7. blood 40. fish 74. old
8. bone 41. yesterday 75. to eat
9. 2nd person pronouns 42. to drink 76. thigh
10. root 43. black 77. thick
11. to come 44. navel 78. long
12. breast 45. to stand 79. to blow
13. rain 46. to bite 80. wood
14. 1SG pronoun 47. back 81. to run
15. name 48. wind 82. to fall
16. louse 49. smoke 83. eye
17. wing 50. what? 84. ash
18. flesh/meat 51. child 85. tail
19. arm/hand 52. egg 86. dog
20. fly 53. to give 87. to cry/weep
21. night 54. new 88. to tie
22. ear 55. to burn 89. to see
23. neck 56. not 90. sweet
24. far 57. good 91. rope
25. to do/make 59. knee 92. shade/shadow
26. house 60. sand 93. bird
27. stone/rock 61. to laugh 94. salt
28. bitter 62. to hear 95. small
29. to say 63. soil 96. wide
30. tooth 64. leaf 97. star
31. hair 65. red 98. in
32. big 66. liver 99. hard
33. one 67. to hide 100. to crush/grind

 

It is very helpful to record words in a context or frame. A phonetician looking at various acoustic properties of consonants or vowels will be be able to study these sounds better if presented in consistent frames.  Common frames used include “Say ___ again” or “I like the word ____”.  A morphologist looking to analyze words and their subparts (e.g., roots and affixes) will want to see  examples of different roots with the same affix to figure out the predictable patterns of how roots and affixes combine.  Do the sounds change?  Does the meaning change? For example, think of the English plural affix which has a different sound but same meaning in the words books, boxes, bags. A lexicographer looking to create dictionary materials may want natural usage examples found in texts or provided by speakers to provide culturally accurate definitions.

The meaning of a word is more than just its translation into a language of wider communication. A one-word translation is often misleading. Take, for example, the Lamkang (lmk) word pleu which was glossed ‘shine’ in an older list we were checking. Upon further discussion, we learned from speaker Rex Khullar that this word means ‘glare’ and could be used to describe the glare of sunlight, a super-shiny fabric, or the glare from a car’s headlights. Rather than a one-word definition, we noted all the extra information about the context and usage provided by Rex Khullar. One might also note information on appropriate use, such as what would be considered polite or familiar speech, appropriate for use in a classroom or a café, or whether it is used in conversation or in folklore.

 

Activity and Discussion:  Watch these community dictionary creators talk about their collection and analysis process.  What are some common themes and challenges expressed by these documenters?  https://www.youtube.com/playlist?list=PL13Oifva0WH7idRJnS9vMkUX9SZ7C7K3g

 

Conversations and Interviews

At the other end of complexity from wordlists are conversations which provide information a wealth of information on how people actually use language to communciate emotions, needs, information and the like. Conversations often include less common clause strcutures and less common meanings of words  and phrases.

Discussion: Why record conversations?

For many language documenters, the idea of recording conversations is daunting. The types of questions asked are:

Q: Won’t there be a lot of extra noise? A: There are appropriate microphones and placement of microphones to catch the individuals speaking.

Q: Do I really want to record unrehearsed speech with false starts and partial sentences? A: In conversation, we hear language forms we may not hear otherwise. Think of tags like “ok?” or “right?”. We hear natural interactions with all the necessary intonation and politeness or familiarity markers, and these are important for language revitalization.

Use discussion time to create four more question-answer pairs about why or why not to include conversations in your documentary corpus.

 

Recording conversations takes some skill and practice.

Technical aspect: There is a technical element in setting up recorders and the challenge of getting clear recordings of overlapping speech. There are solutions to this, for example, multiple speakers can be provided their own microphones so that each participant’s speech is recorded on its own channel for individual analysis. Another technical skill is noting and managing the metadata or details of the interaction (the who, what, where, which, when, and how of the event).

Human aspect: There is also the human element. How are you as the documenter influencing the recording? Are the speakers using a different dialect or more careful speech because they are being recorded, or do they switch to different variety because you are present? Do the participants know one another? When speakers are more familiar with each other, they will likely use less formal, careful speech. Are the participants the same age/gender? Differences in age and gender between speakers may result in more formal, careful speech. Are the participants from the same area? Speakers from different dialect areas may not comfortably speak with one another in the target language, but instead, may be more likely to shift to a different dialect or language to accommodate. Are the participants interested in the same topic? People tend to talk more if they are engaged in a topic. If you prompt a weaver to talk about weaving, they are likely to give more detailed information to another member of the community who has some interest or experience in weaving.

 

Discussion: Conversation prompts

Discuss how you might stimulate conversations of specific topics. If you want to know about traditional practices such as cooking or weaving, how would you start a conversation on these topics? What are some ethical concerns in recording conversations? What types of informed consent are needed? What are the daily routines of the community members where it would make sense to schedule a recording?

 

Monologues

Samples of speech from a single speaker which are rehearsed, memorized, or extemporaneous performances are common sources of records. These include: traditional folktales, speeches, blessings, proverbs, jokes, personal histories, procedures, and instructions (e.g. how to cook a meal, build a house, fish, trap an animal, grow vegetables).

Discussion: Comparing community resource persons

Let us take a minute to review the monologue data you plan to collect. Are there specific traditional stories you are looking to add to your corpus? How will you recruit contributors to this audio or video monologue corpus?   Are there specific speakers who are good a cooking a traditional dish, and will they also be able to explain the process clearly, or will you turn to an older or younger speaker?

 

Going beyond words

As the saying goes, a picture is worth a thousand words. We have found that community collections include photographs of environment and culture, like flora and fauna, festivals, and objects like utensils for cooking or farming and musical instruments. What photographs have you already collected, and why those? What would you like the younger generation of community members to know about an image? What information about the photograph will you note down to accompany the photograph?

Further Reading

Chelliah, Shobhana Lakshmi. Making Photographs in Language Archives Maximally Useful: Metadata Guidelines for Community and Academic Depositors, article, July 3, 2023; (https://digital.library.unt.edu/ark:/67531/metadc2114301/: accessed January 12, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Information.

Green, J., Meakins, F., Turpin, M. (2018). Understanding Linguistic Fieldwork. United Kingdom: Taylor & Francis.

1.3  Technologies for collection

The use of specialized equipment for audio and video recording is covered in many online resources. We are keeping links to these updated on the CoRSAL website. The links provide guides on audio and video recording, including what equipment to use, how to place microphones, how take care of the equipment, the advantages of video, and guidance on audio formats and quality. There are also links on the procedure for digitizing legacy material (materials that already exist in analog form), including digitizing analog audio recordings and scanning documents.  

1.4 Project Activities

The three activities provided here reflect the content of this chapter.

Activity 1: Writing a Report

Write a brief 1-2 page report on existing materials for your language and potential community interest around a language documentation project. What material is still needed to create a more comprehensive record? What are some potential limitations you could encounter? What are some of the things you already have that you can take advantage of?

Activity 2: Creating a Recording

Create an audio or video recording of a relevant speech event. Write a 1 page reflection on how the process went. What were some difficulties you encountered? How did the quality turn out? Were there any surprises?

Activity 3: Scanning a Document

Scan an existing document, ideally, one that is relevant to your documentation project (such as handwritten field notes, a letter in the language, etc.). Write a 1 page reflection on how the process went and why the document is important.

 

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

From Source to Analysis: A language documenter's guide to annotating text Copyright © 2024 by University of North Texas is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book