Creating Bilingual Corpora for isiZulu: A Case Study from the University of KwaZulu-Natal
DOI:
https://doi.org/10.55492/v6i02.6745Keywords:
Corpora, University of KwaZulu-NatalAbstract
Although several bilingual resources exist, there is a lack of domain-specific, institutionally verified parallel corpus focusing on academic and administrative texts. Existing datasets such as Autshumato English–isiZulu corpus, UNISA English/Zulu Parallel Corpus, and the WebCrawl African Corpus hosted on GitHub provide valuable material but differ in accessibility, domain coverage, and documentation. To complement these initiatives, the University Language Planning and Development Office (ULPDO) at the University of KwaZulu-Natal has developed a curated isiZulu–English Parallel Corpus comprising 10,000 carefully aligned sentence pairs drawn from institutional and academic texts. This paper outlines the corpus compilation process, including data sourcing, cleaning, alignment, and validation, and discusses key structural and linguistic challenges encountered. The resource contributes to translation studies, terminology development, and multilingual natural language processing, while supporting ongoing efforts to advance the digital presence and intellectualisation of isiZulu.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Thandeka Mbali Gumede, Rooweither Mabuya, Njabulo Hadebe

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.