New uses for old books

Description of digitised corpora-based on the Setswana language collection in the WITS Cullen Africana Collection

Authors

  • Malebogo Rahlao The University of the Witwatersrand, Johannesburg
  • Nina Lewin The University of the Witwatersrand, Johannesburg
  • Taariq Surtee The University of the Witwatersrand, Johannesburg

DOI:

https://doi.org/10.55492/dhasa.v3i03.3819

Keywords:

Setswana, Library, digitisation, genres, corpus

Abstract

This paper described a corpus of 104 books separated from a larger collection of African Langaguge books. The books were catalogued into a standard library and archival metadata. A subset was digitised and cleaned. The books were then divided into five subsets and compared against each other and the entire Corpus. We have also created tables of collocates, words frequencies. We also performed basic statistics on those words(see tables in the appendix). We speculated that the Corpus as a whole could be roughly used as a general language register. We also give some examples of the characteristics of the genre subsets. The paper aims to introduce the Corpus to NPL researchers and offer it for further research.

Downloads

Published

2022-02-24

How to Cite

New uses for old books: Description of digitised corpora-based on the Setswana language collection in the WITS Cullen Africana Collection. (2022). Journal of the Digital Humanities Association of Southern Africa , 3(03). https://doi.org/10.55492/dhasa.v3i03.3819