Logooli

Wanjawa, Barack;  Wanzare, Lilian D.A.;  Indede, Florence;  McOnyango, Owen;  Ombui, Edward;  Muchemi, Lawrence

Logooli

Date

2024-01-24

Authors

Wanjawa, Barack; Wanzare, Lilian D.A.; Indede, Florence; McOnyango, Owen; Ombui, Edward; Muchemi, Lawrence

Publisher

Harvard Dataverse

Abstract

Primary data was collected from the respective language communities, which also included indiginous stories and other narratives from student compositions, native language media stations, and publishers. This went beyond the conventional religious texts to include other genres of texts that made the corpus more representative of everyday language use in the communities. Text data : 359 texts for Luhya-Logooli. Spontaneous Speech data: 44 files (12hr 26min 55sec) for Luhya-Logooli.

Description

Acknowledgement of data collectors: Kiswahili - Rose Felynix, Khalid Kitito, Dr. Benard Okal Luo - Jotham Ondu Ajiki, Dr. Jackline Okello, Jonathan Muga, Mercy Lavinca Oduoll Luhyia (Logooli) - Salano Odari, Dr. Phillip Lumwamu Luhyia (Bukusu) - Mactilda Nekesa Makana, Mulwale Martin Luhyia (Marachi) - Yonah Weunda

Keywords

Datasets, low resource languages, African languages, Dataset curation, Logooli

Citation

Wanjawa, B., Wanzare, L., Indede, F., McOnyango, O., Ombui, E., & Muchemi, L. (2023). Kencorpus: A Kenyan language corpus of Swahili, Dholuo and Luhya for natural language processing tasks. Journal for Language Technology and Computational Linguistics, *36*(2), 1–27. https://arxiv.org/abs/12081

URI

https://doi.org/10.7910/DVN/6N5V1K
https://kencorpus.ke/handle/00254/36

Collections

Logooli

Full item page

Logooli

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By