Harnessing Google Translations to Develop a Readability Corpus for Sesotho: An Exploratory Study

Authors

  • Johannes Sibeko

DOI:

https://doi.org/10.55492/dhasa.v5i1.5010

Keywords:

Sesotho, Examination Texts, Machine Translation, Text readability, Google Translate

Abstract

This article addresses the scarcity of gold-standard annotated corpora for readability assessment in Sesotho, a low-resource language. As a solution, we propose using translated texts to construct a readability-labelled corpus. Specifically, we investigate the feasibility of using Google Translate to translate texts from Sesotho to English and then manually post-editing the texts. We then evaluate the effectiveness of the Google translations by comparing them to the human-post-edited versions. We utilised the Ghent University readability demo to extract the readability levels of both the Google translations and the human-post-edited translations. The translations are then evaluated using three evaluation metrics, namely, BLEU, NIST, and RIBES scores. The translation evaluation results reveal substantial similarities between the machine translations and the corresponding human-post-edited texts. Moreover, the results of the readability assessment and the comparison of text properties demonstrate a high level of consistency between machine translations and human-post-edited texts. These findings suggest that Google Translations show promise in addressing challenges in developing readability-labelled parallel datasets in low-resource languages like Sesotho, highlighting the potential of leveraging machine translation techniques to develop translated corpora for such languages. The evaluation of Google Translations in the context of educational texts in Sesotho and the demonstration of the feasibility and potential of using machine translations for enhancing readability in Sesotho will aid in the quest for developing Sesotho text readability measures.

Downloads

Published

2024-02-19

How to Cite

Harnessing Google Translations to Develop a Readability Corpus for Sesotho: An Exploratory Study. (2024). Journal of the Digital Humanities Association of Southern Africa , 5(1). https://doi.org/10.55492/dhasa.v5i1.5010