Canadian Indigenous languages technology project
We are working on speech- and text-based technologies that aim to assist the stabilization, revitalization and reclamation of Indigenous languages by supporting Indigenous language educators and students, promoting the accessibility of audio recordings, and supporting Indigenous language translators, transcribers and other language professionals.
- Language-independent technology (such as software) will be released to communities as open-source software.
- We will be working under the direction and advice of an Advisory committee, and in close collaboration and partnership with Indigenous community organizations and Indigenous communities across Canada.
- All research done within this project will be compliant with the Tri-Council Research Ethics Policy.
- Budget 2017 invested $89.9M over three years to support Indigenous languages and cultures. We were granted $6M of this funding.
- This project is managed by the NRC's Digital Technologies Research Centre.
- There are thousands of hours of recordings of Indigenous languages from across the country.
- The recordings can be difficult for Indigenous communities to access and make use of because they are not always fully transcribed, and sometimes are missing metadata (information about what languages are being spoken, who is speaking, etc.).
- To create software that will automatically segment and label audio files while they're being recorded (or shortly afterwards).
- To build and test audio-indexation software that makes it possible to search through existing recordings, including recordings made decades ago, to find key words or phrases.
- The complexity of words in Indigenous languages – in which single, long words made up of many small pieces know as morphemes, can often express what other languages express with entire clauses – poses difficulties for software applications (including both educational and professional software) that lack language-specific word-handling capabilities.
- Teaching how to form words is a central concern in Indigenous language education.
- Word complexity, and, in some languages, the complexity of the writing systems, mean that writing in accordance with official community standards is difficult for many learners.
- To design, in collaboration with instructors, educational tools that support exploratory learning of word formation.
- To develop tools for spell-checking and grammar-checking, for integration with desktop and mobile applications, to help language users at all levels to follow their community's writing standards.
We are taking a "first deep, then broad" approach. Each software tool we build will initially be specialized to one or two Indigenous languages in Canada, but built in a way that allows customization for additional languages.
We are currently working with:
- Kanyen'keha (Mohawk)
Through thoughtful design, and subsequent testing, we will attempt to ensure that the tools we develop in this way will be adaptable to many different languages after this initial development period.
We are collaborating formally and informally with:
Website: 7000 Languages
Project description: Initiative for Creating Online Indigenous Language Courses (COILC initiative)
The NRC has partnered with the experts at 7000 Languages, a non-profit, non-Indigenous organization based in the United States that creates courses for endangered languages around the world. The NRC will fund selected community teams who wish to create online courses for their languages. Find out more about COILC.
Alberta Language Technology Lab, University of Alberta
Project description: Since 2013, the Alberta Language Technology Lab (ALTLab) at the University of Alberta, headed by Dr. Antti Arppe, has been combining research on language structure with the creation of computational tools for Indigenous languages, starting with Plains Cree. The lab has been building on earlier work by its Norwegian partners on Saami and other threatened Uralic languages of Northern Eurasia which resulted in the Giella linguistic software development infrastructure. This infrastructure allows for the straightforward, rapid creation of end-user applications for morphologically complex languages.
Another section of this webpage describes the NRC's collaboration with the Onkawenna Kentyokwa Mohawk-language immersion school to build an educational tool called Kawennonis. This tool – which is currently being extended to other Iroquoian languages – was built within the Giella infrastructure. It would have been much more difficult for the NRC team to create Kawennonis without the help of the ALTLab team's Giella expertise. An NRC software developer, Eddie Santos, is currently embedded in the ALTLab to enhance the synergy between the two teams.
Canadian Broadcasting Corporation (CBC)
Website: Canadian Broadcasting Corporation
Project description: CBC creates programming by and for Indigenous peoples, providing services in eight Indigenous/Inuit languages. CBC is providing the Computer Research Institute of Montreal (CRIM) with access to East James Bay Cree recordings, as part of the NRC's Indigenous languages technology project, so that CRIM can develop audio segmentation and analysis tools suitable for indexing audio recordings in Indigenous languages. CBC has shared over 1,343 hours of radio programming originally broadcast by CBC North from January 2015 to December 2016. These 1,312 audio files, which contain studio/telephone quality speech as well as music, are highly appreciated by the NRC and CRIM project teams and will be critical to the success of the project.
Project description: Algonquian Dictionaries Project (East Cree and Innu)
The collaboration with the NRC is focused on updating online language lessons developed earlier by the Carleton team, in partnership with Cree Programs and Institut Tshakapesh, aimed at supporting East Cree (2006‑2011) and Innu (2009‑2012) literacy.
The online lessons/games/exercises platform supports the creation of multimedia interactive online lessons with auto‑generated exercises/games. In this platform, users are able to listen to a word or phrase in several dialects. They then play computer‑generated interactive activities that test and enhance their vocabulary, orthography and grammar acquisition. They can also engage in more advanced grammatical and textual activities. Teachers can go online to develop additional lesson plans, and track students' progress. Language experts can access an administrative interface to develop new content.
Unfortunately, the rapid pace of change in the software industry has stranded these educational tools: many of the key functionalities no longer work as intended. The collaboration is aimed at updating the platform to align with current technology. The platform update is also an opportunity to improve the experience of second language learners (these tools were originally developed with first language speakers in mind) and to carry out user testing of the lessons.
Computer Research Institute of Montreal (CRIM)
The Computer Research Institute of Montréal (CRIM) is an applied research and expertise centre in information technology. Its speech and text team has a long and distinguished record of accomplishments in technologies related to speech recognition. Its audio content indexing technology indexes the spoken content of very large audio databases, making such content accessible through search engines. CRIM has applied this technology to the archives of the National Film Board (NFB) and to the collected testimonies of the Bastarache investigative commission. CRIM's speaker recognition technology, which identifies the person who generated a particular segment of speech, is world-class. It has consistently ranked among the top entries in international evaluations of speaker recognition systems, and is now used all over the world.
The NRC's collaboration with CRIM is focused on applying audio indexing and speaker recognition technologies to Indigenous languages. Over the years, hundreds of thousands of hours of speech have been recorded in various Indigenous languages. Unfortunately, these recordings are typically not annotated or indexed. Surprisingly, even speech data being collected now by Indigenous communities and linguists have this problem: because there is a lack of tools for segmenting speech data as they are being recorded, the stock of unannotated speech data in Indigenous languages is constantly growing.
We are tackling two aspects of this problem:
- We are developing simple tools that will segment speech as it is being recorded. The tools will separate audio files into speech and non-speech data, and will label the speech segments by the identity of the current speaker. This should make annotation of speech currently being collected easier, for a variety of languages.
- We also plan to build systems that will make it possible to search for particular words or phrases in audio recordings in some Indigenous languages. This will not be full speech recognition and we will not be creating systems that are able to produce high-quality transcriptions of everything that was said in a recording. Rather, the systems will enable audio keyword search, so that users will be able to search quickly through long audio recordings for particular words or topics. We are currently targeting Inuktut and Cree. The Pirurvik Centre is providing valuable assistance on the Inuktut part of this project.
First Peoples' Cultural Council
Website: First Peoples' Cultural Council
Project description: News release about Upgrades to FPCC's FirstVoices Language Tutor software
Official Languages, Department of Culture and Heritage, Government of Nunavut
Project description: Coming soon
Onkwawenna Kentyohkwa Language School
Project description: Kawennón:nis verb conjugator
Onkwawenna Kentyohkwa is an immersion school for teaching Kanyen'kéha (the "Mohawk" language) to adult learners. It is located on the Six Nations of Grand River reserve in southwestern Ontario. Onkwawenna Kentyohkwa was established in 1999 by Owennatekha (Brian Maracle) and Onekiyohstha (Audrey Maracle). Owennatekha is the lead instructor at the school. Many of the school's 100 graduates have gone on to teach the Kanyen'kéha language at the pre-school, elementary, secondary, university or community level.
The focus of the NRC's collaboration with Onkwawenna Kentyohkwa is Kawennón:nis, meaning 'wordmaker' in Kanyen'kéha. Kawennón:nis is a verb conjugator meant to assist learners and educators at the school students of the language, wherever they might be. The idea for the tool was suggested by Owennatekha. The creation and extension of this tool involves a number of researchers at the NRC, Owennatekha, and two other educators from Onkwawenna Kentyohkwa. The language model that powers Kawennón:nis is the first of its kind for any Iroquoian language. Kawennón:nis's user interface is closely linked to the school's curriculum, and is being designed collaboratively between students and educators there, and NRC researchers. Kawennón:nis will be hosted by the school online and on Android and iOS devices; language-independent technology developed for it will be released with an open-source licence.
Website: Pirurvik Centre
Project description: Pirurvik is a centre of excellence for Inuit language, culture and well-being. It was founded in the fall of 2003, and based in Nunavut's capital, Iqaluit. The main focus of the NRC's collaboration with Pirurvik is the transcription into written form of audio recordings of spoken Inuktut. The project criteria will be to select materials that are original language with a depth of vocabulary and not 'thinking in English' while speaking Inuktut.
The transcribed Inuktut speech data will be subsequently be used by the NRC and one of its other partners, Computer Research Institute of Montreal, to develop speech recognition tools that will make it possible to search other Inuktut speech recordings using text queries. This will make it easier for people who speak Inuktut to access and navigate audiovisual documents in their language.
This list is updated on a regular basis and as the project proceeds, collaborations with other organizations will be developed and this list updated.
The following is a list of selected publications by the project team and their collaborators relating to research in Indigenous languages technology.
- Anna Kazantseva, Owennatekha Brian Maracle, Ronkwe'tiy´ohstha Josiah Maracle, and Aidan Pine. Kawenn´on:nis: theWordmaker for Kanyen'k´eha. Proceedings of Workshop on Polysynthetic Languages, pages 53–64. Santa Fe, New Mexico, USA, August 20-26, 2018.
- Patrick Littell, Anna Kazantseva, Roland Kuhn, Aidan Pine, Antti Arppe, Christopher Cox, and Marie-Odile Junker. Indigenous language technologies in Canada: Assessment, challenges, and successes. Proceedings of the 27th International Conference on Computational Linguistics, pages 2620–2632. Santa Fe, New Mexico, USA, August 20-26, 2018.
Our project team
Anna Kazantseva, PhD
Computational linguistics of literature (novels and stories); modeling discourse structure of long informal documents; computational linguistics of Iroquoian languages.
Roland Kuhn, PhD (project lead)
Automatic speech recognition; machine translation.
Patrick Littell, PhD (project advisor)
Computational linguistics of low-resource languages; he has worked with several Indigenous languages, including Kwak'wala/Bak'wamk'ala, Gitksan, and Nłeʔkepmxcín (Thompson River Salish).
Development of software for supporting Indigenous languages; he has developed tools in collaboration with Gitksan & Heiltsuk communities.
We are committed to developing technology in collaboration with Indigenous stakeholders, and have implemented an Indigenous Advisory committee to advise on collaborative methodologies and evaluate project implementations.
Chair of the NRC's Indigenous Languages Technology Project Advisory Committee
Secretary-Treasurer, Prairies to Woodlands Indigenous Language Revitalization Circle
Heather is currently directing a new Master-Apprentice Program in Manitoba and is the Secretary-Treasurer of the Prairies to Woodlands Indigenous Language Revitalization Circle. She holds a Bachelor of Arts from the University of British Columbia and an Masters of Education in Indigenous Language Revitalization from the University of Victoria. Heather is reclaiming her heritage language and, in collaboration with Elders, has published educational resources for the Michif language, such as a conversational phrase book and a college level beginner's course. Heather's interests include the use of the Internet to reach language learners in the diaspora and to create technology-mediated speech communities. She is a citizen of the Métis Nation and a member of the Manitoba Métis Federation.
Youth Ambassador, University of Alberta
Delaney is a fourth year undergraduate student at the University of Alberta majoring in Computer Science and Math. Since her early teens she has worked with her home community of Lac Ste. Anne documenting culture and history. She is working on an interdisciplinary project to develop a language learning system for the Y-dialect of Cree under the supervision of University of Alberta Computing Science professor Dr. Carrie Demmans Epp and University of Alberta Cree professor Dorothy Thunder.
Youth Ambassador, Nak'azdli Whut'en First Nation
Tessa is a member of the Nak'azdli Whut'en First Nation and an eleventh grade student at DP Todd Secondary School in Prince George, BC. Tessa is also a graduate of the First Nations' Technology Council's "Bridging to Technology" program and runs a language project called Dak'elh K'una which is organizing the creation of a Dak'elh language app and immersion summer camp.
Nathan Thanyehténhas Brinklow
Lecturer, Queens University
Nathan is an educator of Kanyen'kéha (Mohawk) with years of experience teaching both at the Tsi Tyónnheht Onkwawén:na Language and Cultural Centre (TTO) and at Queens University. Nathan has a strong interest in how computational methods can be applied to language revitalization and pedagogy and has been involved in the development of Indigenous Language and Mohawk Language and Culture certificates in partnership with TTO and Queens University.
Oral History and Language Lab Manager, Museum of Anthropology, University of British Columbia
Gerry is a proud member of the Heiltsuk First Nation and manages the Oral History and Language Lab at the UBC Museum of Anthropology. With over 15 years in the field of Information Management and Heritage Digitization, he works to develop practical, scalable resources for Indigenous cultural heritage preservation, and to decolonize information practices. Gerry also acts as the Technology Lead for the innovative UBC Indigitization Program and sits on the Board of Directors for the First Peoples' Cultural Council.
Language Team Lead, University nuhelot'įne thaiyots'į nistameyimâkanak Blue Quills (UnBQ)
Marilyn is a member of the Saddle Lake Cree Nation and has worked in adult education for twenty-seven years, four years in small business and four years in Cree Immersion Head Start programming before devoting her time to Language revitalization for both Cree and Dene at UnBQ. While at UnBQ Marilyn has spearheaded the development of a Bachelor of Arts in Cree and Dene, a Masters in Indigenous Languages, an Elders Senate as well as Language Resource Department which produces audio, video and written resources in both Cree and Dene.
Glenn Karonhiio Morrison
Policy Manager, Department of Canadian Heritage
Glenn is a Policy Manager at Canadian Heritage and chairs the interdepartmental Indigenous Languages Translation/Technology Working Group involving Canadian Heritage, Library and Archives Canada, National Research Council of Canada, Parliamentary Translation Bureau and others. He has a longstanding interest in the revitalization of Indigenous languages and was the "sysop" for the first online presence of a First Nations organization in Canada in 1992, using a telephone-based bulletin board program interfaced through a dial-up connection and third-party software. He completed the first level of Onkwawenna Kentyohkwa's online Mohawk language program and is a band member of the Mohawks of Kahnawá:ke.
Associate Professor, University of Victoria
tânisi kiyawaw (greetings to you all). Onowa is maskékow-ininiw (a Swampy Cree person) and Scottish-Canadian, born and raised in Treaty 6 territory. She has been a grateful visitor in SENĆOŦEN and Lekwungen speaking territories for over twenty years and is an urban nêhiyâwiwin language learner and Indigenous language warrior. Onowa is an Associate Professor of Indigenous Education at University of Victoria, where she was the former Director of Indigenous Education in the Faculty of Education. Onowa is co-lead on a Social Science and Humanities Research Council of Canada (SSHRCC) Partnership Grant entitled NEȾOLṈEW̱, which is working to build capacity among Indigenous people and maximize Indigenous language revitalization resources in Canada.
Skayda.û, Tina Jules
Director of the Yukon Native Language Centre
Tina is the Director of the Yukon Native Language Centre for the Council of Yukon First Nations. She is of Tlingit, Mountain Slavey and Cree ancestry and is a citizen of the Teslin Tlingit Council. Her Tlingit name is Skayda.û and she belongs to the Dakhlaweidí (Eagle) clan. Tina holds a Bachelor of Education from the University of Regina and is a proud graduate of the Yukon Native Teacher Education Program. Her Master's Degree in Education for curriculum and instruction is from Simon Fraser University. She is a passionate advocate for Indigenous language revitalization and indigenized education.
Report a problem or mistake on this page
- Date modified: