1 Creating a Language Collection
Learning Objectives
After successful completion of this module, you will have reviewed:
- Motivations for creating a collection
- What to collect and why
- How to locate resources for creating source material
1.1 Introduction
Let us first think about your source data: the video, audio or text material you have collected or are setting out to collect. We will begin with this introspective activity. How would you identify yourself? For example, are you a community language documenter, working independently or with a community to revitalize, preserve, teach, share the language and culture of the community? Are you a student or teacher of linguistics working in classroom setting, learning about documentation and description and discovering the structure of a language through interviews of speakers? Are you an anthropologist, ethnographer, or folklorist? Next, consider your short term goals and your long terms goals in creating a collection. A short term goal may be to highlight the work done by community members to share traditional stories with a larger audience. A long term goal could be to use those stories in pedagogical materials.
Discussion: Where do short and long term goals intersect?
Based on your introspection and class discussion, use the small group format to discuss how similar or different your background, motivation, and goals are. Are there ways that each group can support the other’s goals?
To help with this activity, you might refer to articles or chapters listed at the end of this section. In Why Language Documentation Matters, I (Chelliah) provide case studies of language documentation conducted by academic and non-academic speakers, archivists, and documentary linguists. An easy activity for the class would be to split into groups and discuss and read further about each case study to discover the motivations in those inspirational documentation and archiving examples. The three other readings provide standard definitions of Language Documentation (Woodbury 2011), review planning and implementation of documentation projects (Chelliah 2018), and describe the uses and formats of materials produced from documentation projects (Good 2011).
Further Reading
Chelliah, S. (2021). Why Language Documentation Matters. Springer Briefs in Linguistics. Dordrecht: Springer Academic Press.
Chelliah, S. (2018). The design and implementation of documentation projects for spoken languages. In K. Regh and L. Campbell (Eds.). Oxford University handbook on endangered languages, (pp.147-167). Oxford, England: Oxford University Press. DOI: 10.1093/oxfordhb/9780190610029.013.9
Good, J. (2011). Data and Language Documentation. In P. Austin and J. Sallabank (Eds.). The Cambridge Handbook of Endangered Languages, (pp. 212-234). Cambridge, England: Cambridge University Press. Available from https://www.researchgate.net/publication/282733657_Data_and_language_documentation
Woodbury, A. (2011). Language Documentation. In P. Austin and J. Sallabank (Eds.). The Cambridge Handbook of Endangered Languages, (pp.159-186). Cambridge, England: Cambridge University Press.
_________________________________________________
At this point, we suggest that you download and install the following free software. We will be returning to use this software in the next chapters. Please contact the instructor or teaching assistants if you have any issue with the download and installation. If you are using a Mac. you will need software such as a Parallels Desktop to run these Windows applications. Additional software tailored for use in this course will be available at the CoRSAL website through our expanding CoRSAL software suite. Please check the CoRSAL website for those links.
Download this free software for use in this textbook.
_____________________________________________________
1.2 What to Collect?
What you decide to collect, record, and curate for your collection depends on your ultimate goals. For example, if your goals are language description, then you may aim to collect language used in many contexts by many speakers. If you are interested in language revitalization, you may focus on traditional food items, plants, birds, religious events, or other culture-specific domains, the knowledge of which is being lost.
Who is the intended audience for this collection? Your intended audience may be an audience of one, that is, you create a collection to support your descriptive goals. Or, if your goals are revitalization, your intended audience might be members of your speech community.
As illustrated through case studies in Chelliah (2021), when members of a speaker community play a core role in setting out what needs to be included in a documentary corpus, we have a much better chance of creating a lasting comprehensive record of language. Combined or coordinated efforts between collectors leads to rich, diversified language documentation with multiple uses.
Recordings of spoken, naturalistic interactions are a major part of what language documenters collect. Let us discuss as a class, the kinds of naturalistic interactions you might want to document. Here are some typical items that documenters collect: photographs, rare text documents about or in the language being documented, songs (often times with dance or gestures), traditional stories and personal anecdotes, conversations, public speeches, sermons, political speeches, responses to non-linguistic interview questions, responses to prompts related to language inquiry, and wordlists.
Based on your personal access to these types of interactions you want to collect, what will your challenges and opportunities be in achieving your goals?
Discussion: Can language documentation be ethically implemented?
Do language documentation projects implemented by linguists employ methods that are extractive (getting data from speech community) or exploitative (using the data for personal academic gains)? Discuss your answers to this question and posit solutions. This article by Adrienne Tsikewa can help with your discussion: https://muse.jhu.edu/article/840964
Your documentation project will include different types of speech samples. Some of them will be speech used in common interactions (e.g., buying something at the market) or special linguistic practices (a blessing ritual). Documentation projects also include tools to understand the interactions you have collected; for example, wordlists to aid with glossing, word analyses to help with writing a grammar of the language, and sound and word analyses to help with orthography development. In other words, your documentation project will have audio, video, and textual records of language interactions you want to document, but also will likely include results of analyses of that source material.
Here we provide some pointers on where to begin with your language documentation. This is to answer the question we have often heard from groups in India: “Where do we start?” Answer: Let’s start with words!
Wordlists
Wordlist data is a good place to start when learning how to elicit, record, and archive speech samples. Wordlists are also valuable in that they can be the start of a lexicon or dictionary that can be used in translating longer speech samples like clauses in traditional narratives.
Activity: Find existing materials on your language
Create an annotated bibliography of the existing materials on your language. Is there a dictionary or grammar? Are there wordlists or grammatical sketches in edited volumes or surveys? Have community members created wordlists that are circulating on social media? What types of audio and video have already been created to support understanding of the collected words? Where are these audio and video files stored and shared? Knowing what is already there will help you decide what to add or build on.
In preparing for wordlist elicitation, choose lists that meet your goals. For example, if you want to do typological research, comparing between many unrelated languages, you will want to include words commonly found in the world’s languages, such as on the Swadesh List, or the Leipzig-Jakarta List given below. If your goals include historical work comparing related languages, a list tailored to the specific language family or subgroup is more appropriate. Here is the Leipzig-Jakarta List which will be useful for many purposes, including a first-pass at establishing the sounds of your language.
Leipzig-Jakarta List
1. fire | 34. who? | 68. skin/hide |
2. nose | 35. 3rd person pronouns | 69. to suck |
3. to go | 36. to hit/beat | 70. to carry |
4. water | 37. leg/foot | 71. ant |
5. mouth | 38. horn | 72. heavy |
6. tongue | 39. this | 73. to take |
7. blood | 40. fish | 74. old |
8. bone | 41. yesterday | 75. to eat |
9. 2nd person pronouns | 42. to drink | 76. thigh |
10. root | 43. black | 77. thick |
11. to come | 44. navel | 78. long |
12. breast | 45. to stand | 79. to blow |
13. rain | 46. to bite | 80. wood |
14. 1SG pronoun | 47. back | 81. to run |
15. name | 48. wind | 82. to fall |
16. louse | 49. smoke | 83. eye |
17. wing | 50. what? | 84. ash |
18. flesh/meat | 51. child | 85. tail |
19. arm/hand | 52. egg | 86. dog |
20. fly | 53. to give | 87. to cry/weep |
21. night | 54. new | 88. to tie |
22. ear | 55. to burn | 89. to see |
23. neck | 56. not | 90. sweet |
24. far | 57. good | 91. rope |
25. to do/make | 59. knee | 92. shade/shadow |
26. house | 60. sand | 93. bird |
27. stone/rock | 61. to laugh | 94. salt |
28. bitter | 62. to hear | 95. small |
29. to say | 63. soil | 96. wide |
30. tooth | 64. leaf | 97. star |
31. hair | 65. red | 98. in |
32. big | 66. liver | 99. hard |
33. one | 67. to hide | 100. to crush/grind |
It is very helpful to record words in a context or frame. A phonetician looking at various acoustic properties of consonants or vowels will be be able to study these sounds better if presented in consistent frames. Common frames used include “Say ___ again” or “I like the word ____”. A morphologist looking to analyze words and their subparts (e.g., roots and affixes) will want to see examples of different roots with the same affix to figure out the predictable patterns of how roots and affixes combine. Do the sounds change? Does the meaning change? For example, think of the English plural affix which has a different sound but same meaning in the words books, boxes, bags. A lexicographer looking to create dictionary materials may want natural usage examples found in texts or provided by speakers to provide culturally accurate definitions.
The meaning of a word is more than just its translation into a language of wider communication. A one-word translation is often misleading. Take, for example, the Lamkang (lmk) word pleu which was glossed ‘shine’ in an older list we were checking. Upon further discussion, we learned from speaker Rex Khullar that this word means ‘glare’ and could be used to describe the glare of sunlight, a super-shiny fabric, or the glare from a car’s headlights. Rather than a one-word definition, we noted all the extra information about the context and usage provided by Rex Khullar. One might also note information on appropriate use, such as what would be considered polite or familiar speech, appropriate for use in a classroom or a café, or whether it is used in conversation or in folklore.
Conversations and Interviews
At the other end of complexity from wordlists are conversations which provide information a wealth of information on how people actually use language to communciate emotions, needs, information and the like. Conversations often include less common clause strcutures and less common meanings of words and phrases.
Discussion: Why record conversations?
For many language documenters, the idea of recording conversations is daunting. The types of questions asked are:
Q: Won’t there be a lot of extra noise? A: There are appropriate microphones and placement of microphones to catch the individuals speaking.
Q: Do I really want to record unrehearsed speech with false starts and partial sentences? A: In conversation, we hear language forms we may not hear otherwise. Think of tags like “ok?” or “right?”. We hear natural interactions with all the necessary intonation and politeness or familiarity markers, and these are important for language revitalization.
Use discussion time to create four more question-answer pairs about why or why not to include conversations in your documentary corpus.
Recording conversations takes some skill and practice.
Technical aspect: There is a technical element in setting up recorders and the challenge of getting clear recordings of overlapping speech. There are solutions to this, for example, multiple speakers can be provided their own microphones so that each participant’s speech is recorded on its own channel for individual analysis. Another technical skill is noting and managing the metadata or details of the interaction (the who, what, where, which, when, and how of the event).
Human aspect: There is also the human element. How are you as the documenter influencing the recording? Are the speakers using a different dialect or more careful speech because they are being recorded, or do they switch to different variety because you are present? Do the participants know one another? When speakers are more familiar with each other, they will likely use less formal, careful speech. Are the participants the same age/gender? Differences in age and gender between speakers may result in more formal, careful speech. Are the participants from the same area? Speakers from different dialect areas may not comfortably speak with one another in the target language, but instead, may be more likely to shift to a different dialect or language to accommodate. Are the participants interested in the same topic? People tend to talk more if they are engaged in a topic. If you prompt a weaver to talk about weaving, they are likely to give more detailed information to another member of the community who has some interest or experience in weaving.
Discussion: Conversation prompts
Discuss how you might stimulate conversations of specific topics. If you want to know about traditional practices such as cooking or weaving, how would you start a conversation on these topics? What are some ethical concerns in recording conversations? What types of informed consent are needed? What are the daily routines of the community members where it would make sense to schedule a recording?
Monologues
Samples of speech from a single speaker which are rehearsed, memorized, or extemporaneous performances are common sources of records. These include: traditional folktales, speeches, blessings, proverbs, jokes, personal histories, procedures, and instructions (e.g. how to cook a meal, build a house, fish, trap an animal, grow vegetables).
Discussion: Comparing community resource persons
Let us take a minute to review the monologue data you plan to collect. Are there specific traditional stories you are looking to add to your corpus? How will you recruit contributors to this audio or video monologue corpus? Are there specific speakers who are good a cooking a traditional dish, and will they also be able to explain the process clearly, or will you turn to an older or younger speaker?
Going beyond words
As the saying goes, a picture is worth a thousand words. We have found that community collections include photographs of environment and culture, like flora and fauna, festivals, and objects like utensils for cooking or farming and musical instruments. What photographs have you already collected, and why those? What would you like the younger generation of community members to know about an image? What information about the photograph will you note down to accompany the photograph?
Further Reading
Chelliah, Shobhana Lakshmi. Making Photographs in Language Archives Maximally Useful: Metadata Guidelines for Community and Academic Depositors, article, July 3, 2023; (https://digital.library.unt.edu/ark:/67531/metadc2114301/: accessed January 12, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Information.
Green, J., Meakins, F., Turpin, M. (2018). Understanding Linguistic Fieldwork. United Kingdom: Taylor & Francis.
1.3 Technologies for collection
The use of specialized equipment for audio and video recording is covered in many online resources. We are keeping links to these updated on the CoRSAL website. The links provide guides on audio and video recording, including what equipment to use, how to place microphones, how take care of the equipment, the advantages of video, and guidance on audio formats and quality. There are also links on the procedure for digitizing legacy material (materials that already exist in analog form), including digitizing analog audio recordings and scanning documents.
1.4 Project Activities
The three activities provided here reflect the content of this chapter.
Activity 1: Writing a Report
Write a brief 1-2 page report on existing materials for your language and potential community interest around a language documentation project. What material is still needed to create a more comprehensive record? What are some potential limitations you could encounter? What are some of the things you already have that you can take advantage of?
Activity 2: Creating a Recording
Create an audio or video recording of a relevant speech event. Write a 1 page reflection on how the process went. What were some difficulties you encountered? How did the quality turn out? Were there any surprises?
Activity 3: Scanning a Document
Scan an existing document, ideally, one that is relevant to your documentation project (such as handwritten field notes, a letter in the language, etc.). Write a 1 page reflection on how the process went and why the document is important.