Creating Bilingual Corpora for isiZulu: A Case Study from the University of KwaZulu-Natal

Authors

  • Thandeka Mbali Gumede University of KwaZulu-Natal image/svg+xml
  • Rooweither Mabuya South African Centre for Digital Language Resources
  • Njabulo Hadebe University of KwaZulu-Natal image/svg+xml

DOI:

https://doi.org/10.55492/v6i02.6745

Keywords:

Corpora, University of KwaZulu-Natal

Abstract

Although several bilingual resources exist, there is a lack of domain-specific, institutionally verified parallel corpus focusing on academic and administrative texts. Existing datasets such as Autshumato English–isiZulu corpus, UNISA English/Zulu Parallel Corpus, and the WebCrawl African Corpus hosted on GitHub provide valuable material but differ in accessibility, domain coverage, and documentation. To complement these initiatives, the University Language Planning and Development Office (ULPDO) at the University of KwaZulu-Natal has developed a curated isiZulu–English Parallel Corpus comprising 10,000 carefully aligned sentence pairs drawn from institutional and academic texts. This paper outlines the corpus compilation process, including data sourcing, cleaning, alignment, and validation, and discusses key structural and linguistic challenges encountered. The resource contributes to translation studies, terminology development, and multilingual natural language processing, while supporting ongoing efforts to advance the digital presence and intellectualisation of isiZulu.

Downloads

Published

2025-12-31

Issue

Section

Articles

How to Cite

Creating Bilingual Corpora for isiZulu: A Case Study from the University of KwaZulu-Natal. (2025). Journal of the Digital Humanities Association of Southern Africa (DHASA), 6(2). https://doi.org/10.55492/v6i02.6745