Lubukusi
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Harvard Dataverse
Abstract
Primary data was collected from the respective language communities, which also included indiginous stories and other narratives from student compositions, native language media stations, and publishers. This went beyond the conventional religious texts to include other genres of texts that made the corpus more representative of everyday language use in the communities. Text data : 135 texts for Luhya-Lubukusu . Spontaneous Speech data: 354 files (30hr 11min) for Luhya-Lubukusu.
Description
Acknowledgement of data collectors:
Kiswahili - Rose Felynix, Khalid Kitito, Dr. Benard Okal
Luo - Jotham Ondu Ajiki, Dr. Jackline Okello, Jonathan Muga, Mercy Lavinca Oduoll
Luhyia (Logooli) - Salano Odari, Dr. Phillip Lumwamu
Luhyia (Bukusu) - Mactilda Nekesa Makana, Mulwale Martin
Luhyia (Marachi) - Yonah Weunda
Citation
Wanjawa, B., Wanzare, L., Indede, F., McOnyango, O., Ombui, E., & Muchemi, L. (2023). Kencorpus: A Kenyan language corpus of Swahili, Dholuo and Luhya for natural language processing tasks. Journal for Language Technology and Computational Linguistics, *36*(2), 1–27. https://arxiv.org/abs/12081