New uses for old books
Description of digitised corpora-based on the Setswana language collection in the WITS Cullen Africana Collection
DOI:
https://doi.org/10.55492/dhasa.v3i03.3819Keywords:
Setswana, Library, digitisation, genres, corpusAbstract
This paper described a corpus of 104 books separated from a larger collection of African Langaguge books. The books were catalogued into a standard library and archival metadata. A subset was digitised and cleaned. The books were then divided into five subsets and compared against each other and the entire Corpus. We have also created tables of collocates, words frequencies. We also performed basic statistics on those words(see tables in the appendix). We speculated that the Corpus as a whole could be roughly used as a general language register. We also give some examples of the characteristics of the genre subsets. The paper aims to introduce the Corpus to NPL researchers and offer it for further research.
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Journal of the Digital Humanities Association of Southern Africa
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.