Stop words in Khoekhoe

Authors

  • Menno Van Zaanen South African Centre for Digital Language Resources
  • Alena Witzlack-Makarevich University of Cologne image/svg+xml

DOI:

https://doi.org/10.55492/v6i02.6746

Keywords:

Stop words, digital language resources, Khoekhoe

Abstract

Stop word lists are useful resources that allow for the filtering of words in texts that typically do not carry (much) content. Filtering stop words can improve the efficiency and accuracy of data processing. Stop words are typically short and occur very frequently in texts. Stop word lists are language dependent and many low-resource languages currently do not have (accurate) stop word lists. In this article, we look at how we can create, based on word frequency, a stop word list for Khoekhoe, which is a low-resource language spoken in Southern Africa. Given that stop words do not carry much content, they can be expected to occur consistently across different texts. We compare lists of most frequent words between texts in different genres and which words feature in these lists consistently. We look at the overlap of frequent words in English texts and compare these to a known English stop word list as well, and compare the results with the overlap of frequent words in Khoekhoe texts. The results show that there is a high overlap between genres for English, but the overlap between the Khoekhoe genres is lower. This may be due to a different typological profile of Khoekhoe. This means that creating a stop word list for Khoekhoe is more complicated and most likely requires other techniques to produce a useful stop word list.

Downloads

Published

2025-12-31

Issue

Section

Articles

How to Cite

Stop words in Khoekhoe. (2025). Journal of the Digital Humanities Association of Southern Africa (DHASA), 6(2). https://doi.org/10.55492/v6i02.6746