Common Voice Scripted Speech 25.0 - Kalenjin

Mbogho, A., Awuor, Q., Kipkebut, A., Wanzare, L., & Oloo, V.

Common Voice Scripted Speech 25.0 - Kalenjin

Date

2025-01-19

Authors

Mbogho, A., Awuor, Q., Kipkebut, A., Wanzare, L., & Oloo, V.

Publisher

Mozilla Foundation

Abstract

This datasheet is for cv-corpus-25.0-2026-03-09 of the Mozilla Common Voice Scripted Speech dataset for Kalenjin [kln - kln]. The dataset contains 70042 clips representing 88.12 hours of recorded speech (40.65 hours validated) from 41 speakers, recorded from a text corpus of 29,961 sentences.

Description

This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.

Keywords

Kalenjin

Citation

Mbogho, A., Awuor, Q., Kipkebut, A., Wanzare, L., & Oloo, V. (2025). Building low-resource African language corpora: A case study of Kidaw'ida, Kalenjin and Dholuo. arXiv. https://doi.org/10.48550/arXiv.2501.11003

URI

https://datacollective.mozillafoundation.org/datasets/cmn2e84ge01kqmm07esyp21xq
https://kencorpus.ke/handle/00254/46

Collections

No Specified Dialect (Kalenjin)

Full item page

Common Voice Scripted Speech 25.0 - Kalenjin

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By