Exploring Afrikaans word embeddings with analogies and nearest neighbours
DOI:
https://doi.org/10.55492/dhasa.v4i01.4443Keywords:
Text embeddings, Afrikaans, Analogy, Evaluation, Low-resource LanguagesAbstract
This paper presents an exploration of word embeddings for Afrikaans using the analogies and nearest neighbours methodologies. We compare the results on three types of embeddings (fastText, FLAIR and GloVe) on a novel analogy data set for Afrikaans, inspired by the Bigger Analogy Test Set: BATS (Gladkova et al. 2016). Our analysis shows that for Afrikaans, similar to English, the types of embeddings influence the quality of analogies found for different linguistic tasks. Our investigation also demonstrates, however, that these Afrikaans embeddings do not encode as clear a linguistic representation as with English embeddings. The exact reason for this is subject to future work, but the added morphological complexity and the lack of data most likely play a role.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Tanja Gaustad, Roald Eiselen
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.