Exploring Afrikaans word embeddings with analogies and nearest neighbours

Authors

  • Tanja Gaustad Centre for Text Technology, North-West University
  • Roald Eiselen South African Centre for Digital Language Resources, North-West University

DOI:

https://doi.org/10.55492/dhasa.v4i01.4443

Keywords:

Text embeddings, Afrikaans, Analogy, Evaluation, Low-resource Languages

Abstract

This paper presents an exploration of word embeddings for Afrikaans using the analogies and nearest neighbours methodologies. We compare the results on three types of embeddings (fastText, FLAIR and GloVe) on a novel analogy data set for Afrikaans, inspired by the Bigger Analogy Test Set: BATS (Gladkova et al. 2016). Our analysis shows that for Afrikaans, similar to English, the types of embeddings influence the quality of analogies found for different linguistic tasks. Our investigation also demonstrates, however, that these Afrikaans embeddings do not encode as clear a linguistic representation as with English embeddings. The exact reason for this is subject to future work, but the added morphological complexity and the lack of data most likely play a role.

Downloads

Published

2023-01-25

How to Cite

Exploring Afrikaans word embeddings with analogies and nearest neighbours. (2023). Journal of the Digital Humanities Association of Southern Africa , 4(01). https://doi.org/10.55492/dhasa.v4i01.4443