Siswati Part of Speech Tagger: A Quantitative Evaluation

Authors

  • Muzi Matfunjwa South Africa Centre for Digital Language Resources, South Africa

DOI:

https://doi.org/10.55492/v6i02.6749

Keywords:

Siswati, Part of speech tagger, Recall, Precision, Human Language Technology

Abstract

This article evaluates the performance of the Siswati Text Annotation Tool part of speech (STAT POS) tagger using Recall, Precision and F1 score metrics. A quantitative research design was adopted for analysis, and data was collected through purposive sampling. Python 3 was utilised to calculate the Recall and Precision of the STAT POS tagger outputs. The results show that the Recall for nouns was 0.761, Precision 0.417, with an F1 score of 0.54; for verbs, the Recall was 0.756, Precision 0.798 and F1 score 0.54; for adverbs, the Recall was 0.571, Precision 0.8, and F1 score 0.67; for possessives,  the Recall was 0.963, Precision 0.813 and F1 score 0.88. For relatives (REL), the Recall was 0.706, Precision 0.523, and the F1 score 0.60; for class-indicating demonstratives, the Recall was 0.333, Precision 0.25, and the F1 score 0.29; and for copulatives (COP), the Recall was 0.75, Precision 0.75, and the F1 score 0.75. For conjunctions, the Recall was 0.85, the Precision was 0.68, and the F1 score was 0.76; for pronouns, the Recall was 0.563, the Precision was 1.0, and the F1 score was 0.72; for adjectives, the Recall was 0.75, the Precision was 0.75, and the F1 score was 0.75. However, question words, interjections and ideophones received 0.0., highlighting the need for refinement of the STAT POS tagger. 

Downloads

Published

2025-12-31

Issue

Section

Articles

How to Cite

Siswati Part of Speech Tagger: A Quantitative Evaluation. (2025). Journal of the Digital Humanities Association of Southern Africa (DHASA), 6(2). https://doi.org/10.55492/v6i02.6749