Dholuo (Luo)

dc.contributor.authorWanjawa, Barack; Wanzare, Lilian D.A.; Indede, Florence; McOnyango, Owen; Ombui, Edward; Muchemi, Lawrence
dc.date.accessioned2026-03-24T15:06:26Z
dc.date.issued2024-01-24
dc.descriptionAcknowledgement of data collectors: Kiswahili - Rose Felynix, Khalid Kitito, Dr. Benard Okal Luo - Jotham Ondu Ajiki, Dr. Jackline Okello, Jonathan Muga, Mercy Lavinca Oduoll Luhyia (Logooli) - Salano Odari, Dr. Phillip Lumwamu Luhyia (Bukusu) - Mactilda Nekesa Makana, Mulwale Martin Luhyia (Marachi) - Yonah Weunda
dc.description.abstractPrimary data was collected from the respective language communities, which also included indiginous stories and other narratives from student compositions, native language media stations, and publishers. This went beyond the conventional religious texts to include other genres of texts that made the corpus more representative of everyday language use in the communities. Text data : 546 texts for Dholuo . Spontaneous Speech data: 512 files (99hr 3min 8sec) for Dholuo.
dc.identifier.citationWanjawa, B., Wanzare, L., Indede, F., McOnyango, O., Ombui, E., & Muchemi, L. (2023). Kencorpus: A Kenyan language corpus of Swahili, Dholuo and Luhya for natural language processing tasks. Journal for Language Technology and Computational Linguistics, *36*(2), 1–27. https://arxiv.org/abs/12081
dc.identifier.urihttps://doi.org/10.7910/DVN/6N5V1K
dc.identifier.urihttps://kencorpus.ke/handle/00254/39
dc.publisherHarvard Dataverse
dc.subjectDatasets
dc.subjectlow resource languages
dc.subjectAfrican languages
dc.subjectDataset curation
dc.subjectDholuo
dc.titleDholuo (Luo)
dc.typeDataset

Files