Lumarachi

Wanjawa, Barack; Wanzare, Lilian D.A.; Indede, Florence; McOnyango, Owen; Ombui, Edward; Muchemi, Lawrence

Lumarachi

dc.contributor.author	Wanjawa, Barack; Wanzare, Lilian D.A.; Indede, Florence; McOnyango, Owen; Ombui, Edward; Muchemi, Lawrence
dc.date.accessioned	2026-03-24T14:42:53Z
dc.date.issued	2024-01-24
dc.description	Acknowledgement of data collectors: Kiswahili - Rose Felynix, Khalid Kitito, Dr. Benard Okal Luo - Jotham Ondu Ajiki, Dr. Jackline Okello, Jonathan Muga, Mercy Lavinca Oduoll Luhyia (Logooli) - Salano Odari, Dr. Phillip Lumwamu Luhyia (Bukusu) - Mactilda Nekesa Makana, Mulwale Martin Luhyia (Marachi) - Yonah Weunda
dc.description.abstract	Primary data was collected from the respective language communities, which also included indiginous stories and other narratives from student compositions, native language media stations, and publishers. This went beyond the conventional religious texts to include other genres of texts that made the corpus more representative of everyday language use in the communities. Text data : 483 texts for Luhya-Lumarachi . Spontaneous Speech data: 138 files (15hr 37min 46sec) for Luhya-Lumarachi.
dc.identifier.citation	Wanjawa, B., Wanzare, L., Indede, F., McOnyango, O., Ombui, E., & Muchemi, L. (2023). Kencorpus: A Kenyan language corpus of Swahili, Dholuo and Luhya for natural language processing tasks. Journal for Language Technology and Computational Linguistics, 36(2), 1–27. https://arxiv.org/abs/12081
dc.identifier.uri	https://doi.org/10.7910/DVN/6N5V1K
dc.identifier.uri	https://kencorpus.ke/handle/00254/37
dc.publisher	Harvard Dataverse
dc.subject	Datasets
dc.subject	low resource languages
dc.subject	African languages
dc.subject	Dataset curation
dc.subject	Lumarachi
dc.title	Lumarachi
dc.title.alternative	Luhya-Lumarachi
dc.type	Dataset

Collections

Lumarachi

Lumarachi

Files

Collections