Developing a Code-Mixed Sentiment Analysis Dataset of Xitsonga-English Music Reviews

Authors

  • Blessing Nkuna
  • Thipe I Modipa
  • Simon P. Ramalepe

DOI:

https://doi.org/10.55492/dhasa.v5i1.5022

Keywords:

Code-Mixed, Sentiment Analysis, Xitsonga-English Language

Abstract

Sentiment analysis is the process of classifying text emotions as positive, negative or neutral. Code-mixed sentiment analysis refers to the classification of text’s sentiments that contains two or more languages. There are limited studies developed for sentiment analysis on South African code-mixed languages and this is due to the absence of annotated dataset. The purpose of the study was to collect code-mixed text data for the Xitsonga-English language pair. The study collected Xitsonga-English code-mixed comments for music reviews from a YouTube channel. After the data was collected, tokenization using a python library called natural language toolkit was performed. Subsequently, we analyzed the comments for the presence of code-mixing. The collected Xitsonga-English code-mixed data would be suitable to build a sentiment analysis model.

Downloads

Published

2024-02-19

How to Cite

Developing a Code-Mixed Sentiment Analysis Dataset of Xitsonga-English Music Reviews. (2024). Journal of the Digital Humanities Association of Southern Africa , 5(1). https://doi.org/10.55492/dhasa.v5i1.5022