Preservation of language is paramount to safeguard cultural identity, foster communication and expression, preserve knowledge and embrace linguistic diversity. Language preservation efforts support the promotion of more inclusive and diverse world and, when coupled with digital technology, can help reduce the current limitations of speech-technology favouring the most widely recognised languages.
In our small but mighty Baltic nation of Estonia, a project is underway to ensure the preservation of the Estonian language, while simultaneously improving the accessibility of digital government services. Leveraging crowdsourcing, Estonia’s Ministry of Economic Affairs and Communications (MKM) and the State Information System Agency launched the “Donate a Speech” campaign in September 2022. Already by April 2023, our campaign collected over 100 hours of the Estonian spoken language, fed into an open-source database which will be used to inform speech-technology services, such as text-to-speech and speech-to-text services as well as speech-recognition and voice-controlled software.
Ott Velsberg, The Chief Data Officer of Estonia says that the evolution of Estonian language technology needs constant contributions to support the development of solutions for our society. Currently, the speech recognition accuracy of spontaneous Estonian speech (including accents and dialects) is approximately 85%. However, in the near future, as more language materials are collected, we expect a real boost to Estonian speech recognition quality, with a goal of raising this to at least 91% percent in coming years.
The project, launched as part of the Estonian Language Strategy 2021-2035, focuses “on developing and strengthening Estonian but also sets strategic goals for foreign language learning” (ECML). Through the “Donate a Speech” campaign, Estonia further hopes to empower companies, public sector institutions and research institutions to develop new services that harness the power of speech-technology to improve the efficiency of services.
“The more different voice samples we receive, the better we can train Estonian speech recognition systems. Thereby we can enhance the services that are partly or fully based on speech recognition.” – adds The Chief Data Officer of Estonia.
The boost in the quality of speech recognition systems will give the approximately 1 million Estonian speaking people better and faster access to both public and private services, which are particularly essential for people with hearing impairment. For example, automatic subtitles that are based on speech recognition have already been implemented in Estonian National Broadcast TV shows. But as the process continues, there is a drive to improve the accuracy of the solution as well as enlarge the pool of broadcasting channels offering automatic subtitles. This will also further enhance our societal inclusion, enabling all people to have better access to public and private media.
Of course, there are many other fields where Estonian speech recognition is already implemented or can be implemented in the future, including meeting transcriptions and service driven phone calls. Speech recognition system will also be implemented in Estonian state virtual assistant Bürokratt, an AI-driven software that supports citizens to communicate with public sector agencies under a single channel. With help of data collected through “Donate A Speech”, Estonia hopes to further enhance the sophistication of the technology – improving voice recognition and service delivery not only for Estonian mother tongue speakers, but also for those with Estonian as a second language or with different Estonian dialects.
The results of the “Donate a Speech” project have great potential for both the public and the public sector. Collected speech materials and transcriptions are going to be published on Estonian Open Data portal, enabling free accessibility for those interested in developing language technology and supporting the preservation of the Estonian language.
While we talk about projects aimed at collecting language materials, it is important to keep in mind that the goal of having easily accessible services can be achieved only through continuous hard work and co-operation. Language is a living organism which changes together with the people speaking it, so where one project comes to an end the others should begin!
This guest blog is inspired by the Global Trends in Government Innovation 2023 report from the OECD and MBRCGI. The report presents four major trends in government innovation for 2023, including Trend 3 on Preserving identities and strengthening equity. Through this blog series, we hope to provide a deeper understanding of the current state and future direction of government innovation, and generate conversations about how governments can continue to improve and innovate in the years ahead.