A project at the Data Sciences for Social Impact (DSFSI) group, University of Pretoria, led by Professor Vukosi Marivate, is developing a talking health chatbot in African languages to provide accessible, culturally relevant health information to underserved communities. Central to the initiative is the planned use of health actuality TV and radio programmes produced by the South African Broadcasting Corporation (SABC) as training data, which introduces legal and ethical considerations around the use of publicly funded broadcast materials in AI. In the absence of South Africa’s pending copyright reforms, the project may be held up, pointing to the need for harmonised legal frameworks for AI for Good in Africa.
1. Health Inequality in South Africa and the Role of AI
South Africa experiences some of the world’s highest health inequalities, shaped by historical, economic, and social factors. Urban centers are relatively well-resourced, but rural and peri-urban areas face critical shortages of health professionals and infrastructure. Language and literacy barriers further exacerbate disparities, with many unable to access health information in their mother tongue. Digital health interventions, especially those using artificial intelligence, offer a way to bridge these gaps by delivering accurate, on-demand health information in local languages. An AI-powered chatbot can empower users to make informed decisions, understand symptoms, and navigate the healthcare system, promoting greater equity in health outcomes.
2. What is Natural Language Processing (NLP)?
Natural Language Processing (NLP) is a subfield of artificial intelligence that enables computers to understand, interpret, and generate human language. NLP powers applications such as chatbots, voice assistants, and automated translation tools, making it crucial for digital inclusion, especially for speakers of underrepresented languages.
3. Project Overview
The DSFSI health chatbot project aims to build an AI-powered conversational agent that delivers reliable health information in multiple African languages. The project’s mission is to address health literacy gaps and promote equitable access to vital information, particularly in communities where language and resource barriers persist.
4. Data Sources and Key Resources
A distinctive feature of the project is its intention to use health actuality programmes broadcast by the SABC as primary training data. These programmes offer authentic dialogues in various African languages and cover a wide range of health topics relevant to local communities. However, the use of SABC broadcast material introduces significant legal and ethical complexities. The DSFSI team has spent years negotiating with the SABC to secure permission for use of these programmes as training data, but obtaining a definitive answer has proven elusive, leaving the project in a state of legal uncertainty.
5. Legal and Ethical Challenges
Copyright and Licensing
SABC’s health actuality programmes are protected by copyright, with all rights typically reserved by the broadcaster. Using these materials for AI training without explicit permission may constitute copyright infringement, regardless of educational or social impact goals.
Contractual Restrictions
Even if SABC content is publicly accessible, the broadcaster’s terms of use or licensing agreements may explicitly prohibit reuse, redistribution, or data mining.
Absence of Research Exceptions
South African copyright law currently lacks robust exceptions for text and data mining (TDM) or research use, unlike the European Union’s TDM exceptions or the United States’ Fair Use doctrine.
Data Privacy and Community Engagement
If the chatbot is later trained on user interactions or collects personal health information, the project must also comply with the Protection of Personal Information Act (POPIA) and ensure meaningful informed consent from all participants.
6. Public Funding and the Public Interest Argument
A significant dimension in negotiations with the SABC is the broadcaster’s funding structure. The SABC operates under a government charter and receives substantial public subsidies, with direct grants and bailouts accounting for about 27% of its 2022/2023 revenue. This strengthens the argument that SABC-produced content should be accessible for public interest projects, particularly those addressing urgent challenges like health inequality and language inclusion. Many in the research and innovation community contend that publicly funded content should be available for projects benefiting the broader public, especially those focused on health literacy and digital inclusion.
7. The WIPO Broadcasting Treaty: A New Layer of Complexity
The international copyright landscape is evolving, with the World Intellectual Property Organization (WIPO) currently negotiating a Broadcasting Treaty. Recent drafts propose granting broadcasters—including public entities like the SABC—new, additional exclusive rights over their broadcast content, independent of the underlying copyright. Some drafts suggest these new rights could override or negate existing copyright exceptions and limitations, including those that might otherwise permit uses for research, education, or public interest projects. If adopted in its current form, the WIPO Broadcasting Treaty could further restrict the ability of researchers and innovators to use broadcast material for AI training, even when the content is publicly funded or serves a vital social function.
8. The Copyright Amendment Bill: Introducing Fair Use in South Africa
A potentially transformative development is the Copyright Amendment Bill, which aims to introduce a Fair Use doctrine into South African law. Modeled after the U.S. system, Fair Use would allow limited use of copyrighted material without permission for research, teaching, and public interest innovation—the core activities of the DSFSI health chatbot initiative. If enacted, the Bill would provide a much-needed legal pathway for researchers to use materials like SABC broadcasts for AI training, provided the use is fair, non-commercial, and does not undermine the market for the original work. However, the Bill has faced significant opposition and delays, and is currently under review by the Constitutional Court, leaving its future uncertain.
9. Contractual or Policy Barriers
In the absence of clear research exceptions, the project team must review and potentially negotiate with the SABC to secure permissions or licenses for the intended use of broadcast content. Without such agreements, the project may be forced to exclude valuable data sources or pivot to community-generated content.
10. Cross-Border and Multi-Jurisdictional Issues
If the chatbot expands to use or serve content from other African countries, it will encounter a patchwork of copyright and data protection laws, further complicating compliance and cross-border collaboration.
11. Conclusions
The challenges faced by the DSFSI health chatbot project underscore the urgent need for clearer copyright exceptions and harmonized legal frameworks to support socially beneficial AI research in Africa. Policymakers should consider introducing research and TDM exceptions, while broadcasters and public institutions could play a vital role by making culturally significant content available for responsible research use. The experience also highlights the importance of aligning public funding with public access, especially for projects serving the public good. The evolving international legal landscape, including the WIPO Broadcasting Treaty and the uncertain fate of the Copyright Amendment Bill, makes it even more urgent for stakeholders to advocate for balanced rights and robust exceptions that enable innovation and public benefit.