Building Common Ground from the Ground Up: Repair Infrastructure for Human–Agent Collaboration in African Languages
ABSTRACT
Theories of distributed teamwork portraying LLMs as remote collaborators are frequently constructed around an unexamined assumption: that collaborators share a natural language. For speakers of the vast majority of the world’s approximately 7,000 languages, this assumption does not hold—the LLM agents are not merely remote but functionally non-communicative as they do not share the languages of the users they supposedly collaborate with. Drawing on three years of work through the Centre for Digital Language Inclusion (CDLI), which has scaled community-driven speech recognition from one to thirteen African languages, we argue that linguistic asymmetry is the defining yet overlooked barrier to human–agent collaboration for the majority world. Communities who collect, validate, and steward speech data are not passive data sources but essential third-party collaborators performing the invisible labor that makes human– agent partnership possible. We extend the workshop’s dyadic model into a triadic one and apply a belongingoriented design framework—grounded in disability justice and trauma-informed practice—to reframe this community labor as repair infrastructure: the collective work of building communicative common ground where none yet exists.
CCS Concepts: Human-centered computing → Collaborative and social computing; Accessibility → Accessibility technologies
Keywords: human–agent collaboration, common ground, belonging, repair infrastructure, low-resource languages, speech recognition, articulation work, digital language inclusion, Africa
1 INTRODUCTION
Large Language Models advanced conversational capabilities have shifted the way such technologies are leveraged by users with dynamics moving beyond the ideas of simple tools to be used, towards more interactive dynamics of collaboration [1]. Such conceptualization is grounded in CSCW research into distributed teamwork of humans [2]. It is certainly true that there are many analogies between how collaboration unfolds between users and LLM agents and amongst teams of human collaborators. Namely, during both collaboration between humans and LLM agents as well as among remote human collaborators, communication takes place in digital channels, frequently using text or spoke formats, there is limited to no opportunity to share the same physical space, and iterative interaction is used to establish common grounds for collaboration.
However, the comparison between these two types of collaboration contains an unexamined assumption: that the collaborators share a language. Clark and Brennan’s foundational account of common ground [3] takes mutual intelligibility as a starting condition—something collaborators have before the process of grounding begins. For remote human collaborators, this is generally true. For the majority of the world’s population attempting to interact with LLM agents, it is not.
Of the world’s approximately 7,000 languages, the vast majority are “low-resource”—lacking the digitised text and speech corpora that underpin both LLM training and automatic speech recognition [4]. For speakers of these languages, the agent is not a remote collaborator with whom common ground must be maintained. It is an entity with whom common ground must be constructed from scratch—and that construction requires collective, community-level effort that precedes and enables any individual interaction.
The question is not only how humans and agents collaborate, but who builds the conditions for that collaboration to become possible, and at what cost. In this position paper, we draw on three years of experience through the Centre for Digital Language Inclusion (CDLI) and on recent work in belonging-oriented design [5] to make two arguments. First, that linguistic asymmetry is the primary barrier to human–agent collaboration for the majority world, and that this barrier is largely absent from current discourse on agentic AI. Second, that the communities who collectively build the linguistic infrastructure for human–agent interaction are performing a form of repair work [6]—constructing the communicative bridge that agent designers failed to build—and that this labor deserves recognition, resourcing, and reciprocity within any account of human–agent collaboration.
2 CONSTRUCTING COMMON GROUND: THE CDLI CASE
The Centre for Digital Language Inclusion (CDLI), based at UCL and supported by Google.org, develops speech recognition technology for African languages [7]. Since its inception, CDLI has scaled from a single language dataset to thirteen languages across multiple countries, including Kenya, Ghana, Uganda, and Rwanda. Critically, the project does not follow a conventional extractive data pipeline. It employs a community-driven methodology in which local entrepreneurs and language communities lead the collection, validation, and curation of speech data [8].
This model was born of necessity. Low-resource languages lack the large-scale digital corpora that enable conventional supervised learning. But CDLI’s experience has revealed something more fundamental: the process of building speech datasets is not merely a technical prerequisite for ASR. It is a form of collaborative infrastructure work in which communities negotiate linguistic norms, dialectal variation, and the boundaries of what “counts” as their language in digital form. When a community in western Kenya debates whether to include Luhya sub-dialects or treat them as distinct datasets, they are not resolving a labelling problem—they are determining the scope of future human–agent communication.
The community is not a passive data source. It is an active agent in constructing the conditions under which human–agent collaboration can occur. The entrepreneur ecosystem model further illustrates this dynamic. CDLI trains and supports local entrepreneurs across Ghana, Kenya, Uganda, and Rwanda who serve as intermediaries between community language practices and the technical requirements of ASR systems [8]. These entrepreneurs recruit contributors, manage quality assurance, negotiate community consent, and sustain engagement over time. They perform what Schmidt and Bannon [9] termed articulation work—the coordinative labour that makes primary work possible but is invisible within the primary activity. In this case, the “primary work” is human– agent interaction.
The articulation work is everything a community must do before that interaction can happen at all. To date, CDLI has collected over thirty hours of impaired speech data in Akan through its Ghana pilot, with data collection now extending across Kenya, Uganda, and Rwanda. The project partners with local institutions including the University of Ghana, Talking Tipps Africa, and Senses Hub Kenya, training local entrepreneurs and speech and language therapists to lead community-driven data collection in their own languages. Finetuning the open-source Whisper model on this community-collected data yielded a median relative word error rate reduction of 21.7% on the impaired speech test set, demonstrating both the feasibility of the approach and the scale of the performance gap that remains when communities are excluded from training