Developing a Code-Mixed Sentiment Analysis Dataset of Xitsonga-English Music Reviews
DOI:
https://doi.org/10.55492/dhasa.v5i1.5022Keywords:
Code-Mixed, Sentiment Analysis, Xitsonga-English LanguageAbstract
Sentiment analysis is the process of classifying text emotions as positive, negative or neutral. Code-mixed sentiment analysis refers to the classification of text’s sentiments that contains two or more languages. There are limited studies developed for sentiment analysis on South African code-mixed languages and this is due to the absence of annotated dataset. The purpose of the study was to collect code-mixed text data for the Xitsonga-English language pair. The study collected Xitsonga-English code-mixed comments for music reviews from a YouTube channel. After the data was collected, tokenization using a python library called natural language toolkit was performed. Subsequently, we analyzed the comments for the presence of code-mixing. The collected Xitsonga-English code-mixed data would be suitable to build a sentiment analysis model.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Blessing Nkuna, Thipe I Modipa, Simon P. Ramalepe
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.