Algorithm for Assisting Grammarians when Extracting Phonological Conditioning Rules for Nguni languages
DOI:
https://doi.org/10.55492/dhasa.v5i1.5013Keywords:
Language Technologies, Low-resource Languages, Data extraction, Phonological Conditioning, Natural Language GenerationAbstract
Text generation models, the core technology that underpins chatbots such as ChatGPT, that are created to support morphologically complex African languages require the modelling of sub-word processes such as phonological conditioning. Since we rely on explicit phonological conditioning rules that are manually identified by grammarians to determine the extent to which such models are able to perform for such languages, there is a need to assist grammarians via computational solutions to increase their coverage of known rules. At present, there are no existing algorithms to extract the rules for such processes and therefore enable the creation of building better text generation models. We present a new algorithm for extracting phonological conditioning rules for Nguni languages. All the rules extracted by the algorithm are valid when the input word and associated morphemes are judged to be valid. The algorithm has the potential to improve the productivity of grammarians and enable the creation of modern text generation technologies that support and promote under-resourced languages.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Zola Mahlaza, Langa Khumalo
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.