Localising the Mozilla Common Voice platform for South Africa’s official languages

Authors

  • Febe de Wet Department of Electrical and Electronic Engineering, Stellenbosch University & School of Electrical, Electronic and Computer Engineering, North-West University
  • Andiswa Bukula South African Centre for Digital Language Resources, North-West University
  • Willem Karsten School of Electrical, Electronic and Computer Engineering, North-West University
  • Martin Puttkammer Centre for Text Technology (CTexT), North-West University
  • Erwin Schillack School of Electrical, Electronic and Computer Engineering, North-West University
  • Roné Wierenga Virtuele Instituut vir Afrikaans
  • Roald Eiselen Centre for Text Technology, North-West University

DOI:

https://doi.org/10.55492/dhasa.v4i01.4437

Keywords:

under-resourced languages, speech resources, Mozilla Common Voice

Abstract

Despite many attempts to address the situation, South Africa's official languages remain under-resourced in terms of the text and speech data required to implement state-of-the-art language technology. To ensure that no language is left behind, resource development should remain a priority until a strong digital presence has been established for all indigenous languages. This paper provides an overview of previous projects that were specifically aimed at speech resource development and introduces an ongoing initiative to launch South Africa's languages on the Mozilla Common Voice platform.

Downloads

Published

2023-01-25

How to Cite

Localising the Mozilla Common Voice platform for South Africa’s official languages. (2023). Journal of the Digital Humanities Association of Southern Africa , 4(01). https://doi.org/10.55492/dhasa.v4i01.4437