Creating Bilingual Corpora for isiZulu: A Case Study from the University of KwaZulu-Natal

Thandeka Mbali Gumede; Rooweither Mabuya; Njabulo Hadebe

doi:10.55492/v6i02.6745

Authors

Thandeka Mbali Gumede University of KwaZulu-Natal
Rooweither Mabuya South African Centre for Digital Language Resources
Njabulo Hadebe University of KwaZulu-Natal

DOI:

https://doi.org/10.55492/v6i02.6745

Keywords:

Corpora, University of KwaZulu-Natal

Abstract

Although several bilingual resources exist, there is a lack of domain-specific, institutionally verified parallel corpus focusing on academic and administrative texts. Existing datasets such as Autshumato English–isiZulu corpus, UNISA English/Zulu Parallel Corpus, and the WebCrawl African Corpus hosted on GitHub provide valuable material but differ in accessibility, domain coverage, and documentation. To complement these initiatives, the University Language Planning and Development Office (ULPDO) at the University of KwaZulu-Natal has developed a curated isiZulu–English Parallel Corpus comprising 10,000 carefully aligned sentence pairs drawn from institutional and academic texts. This paper outlines the corpus compilation process, including data sourcing, cleaning, alignment, and validation, and discusses key structural and linguistic challenges encountered. The resource contributes to translation studies, terminology development, and multilingual natural language processing, while supporting ongoing efforts to advance the digital presence and intellectualisation of isiZulu.

Creating Bilingual Corpora for isiZulu: A Case Study from the University of KwaZulu-Natal

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Information