Rule-based machine translation for Aymara

Dec 4, 2014·

Matt Coler

Petr Homola

· 1 min read

Abstract

This paper presents the ongoing result of an approach developed by the collaboration of a computational linguist with a field linguist that addresses one of the oft-overlooked keys to language maintenance: the development of modern language-learning tools. Although machine translation isn’t commonly thought of as a language learning tool, it can be a useful way for learners to better visualize the word-formation process and explore the structure of their own language, particularly in highly agglutinative languages like Aymara and Quechua. Moreover, the availability of translation software could eventually facilitate greater interlinguistic communication between speakers of minority languages and serve to further legitimize written production by such marginalized speakers. We provide an overview of how this software functions, describing the process by which the morphological analyzer (tagger) provides an input for the syntactic analysis (parser). We also give an overview of further possible applications of this work and show how this approach can be easily tweaked to account for different varieties within a given language, thereby preserving intervariant and dialectical differences along with the richness of variation. Moreover, we show how this model can be replicated for other languages and highlight the possibility of collaborative development with native speakers.

Type

Book Chapter

Publication

Endangered Languages and New Technologies

This chapter challenges the notion that machine translation systems for endangered languages have limited utility. Using Aymara, a polysynthetic, non-configurational indigenous Andean language, as a case study, we demonstrate the development of a rule-based machine translation system that serves dual purposes.

First, it contributes to the advancement of machine translation as a field by incorporating languages with typologically uncommon properties, thereby revealing strengths and weaknesses in methods primarily designed for mainstream languages. Second, it provides a practical tool that supports language preservation efforts while enabling digital communication, crucial for endangered language communities in today’s technology-driven world.

The chapter details the technical implementation of our system, from morphological analysis to syntactic parsing, highlighting how computational approaches can be adapted to account for linguistic variation within a language. Furthermore, we discuss how this methodology can be extended to other endangered languages and potentially involve native speakers in the development process, making a strong case for the role of technology in language revitalization efforts.

Last updated on Dec 4, 2014