A Cookbook for Community-driven Data Collection of Impaired Speech in LowResource Languages
Abstract
This study presents an approach for collecting speech samples to build Automatic Speech Recognition (ASR) models for impaired speech, particularly, low-resource languages. It aims to democratize ASR technology and data collection by developing a "cookbook" of best practices and training for community-driven data collection and ASR model building. As a proof-of-concept, this study curated the first open-source dataset of impaired speech in Akan: a widely spoken indigenous language in Ghana.
The study involved participants from diverse backgrounds with speech impairments. The resulting dataset, along with the cookbook and open-source tools, are publicly available to enable researchers and practitioners to create inclusive ASR technologies tailored to the unique needs of speech impaired individuals. In addition, this study presents the initial results of finetuning open-source ASR models to better recognize impaired speech in Akan. Index Terms: automatic speech recognition, impaired speech, low resource language, community engagement, democratizing AI 1. Introduction Automatic Speech Recognition (ASR) technology has transformed human-human and human-computer communications. It facilitates understanding through real-time speech captioning [1], [2] and supports hands-free computing (e.g. email dictations, emails, online information retrieval, and automatic language translation).
ASR is used to control smart home activities, such as changing television channels, heating, ventilation, air conditioning, and adjusting lighting. Although it continues to be useful, most of these technologies do not cater to speech diversity and are often optimized for ‘standard’ or typical speech. Therefore, they fail to benefit individuals with impaired speech such as dysarthria, stammering, or cleft palate who often experience reduced ASR accuracy. Prior studies have demonstrated the potential benefits of speech recognition technologies in English for distinct impaired speech[2], [3], [4]. While this benefits English speakers, it is imperative to extend similar technologies to low-resource languages (LRLs). LRL communities have limited access to assistive technologies and speech and language therapy (SLT) services [5], [6], [7]. Hence, the availability of ASR technologies in LRLs will facilitate effective communication for those with speech impairments, especially in sub-Saharan Africa, where there are insufficient speech therapy resources [7].
This study is part of a larger initiative that seeks to collect, validate, and create a large corpus of impaired speech in LRLs. It reports the findings of a pilot study in the Akan language from Ghana, by discussing the methods, challenges, and lessons learned from the data collection, validation, and testing of the dataset to adapt ASR models.