Skip to content
An official website of the OECD. Find out more
Created by the Public Governance Directorate

This website was created by the OECD Observatory of Public Sector Innovation (OPSI), part of the OECD Public Governance Directorate (GOV).

How to validate authenticity

Validation that this is an official OECD website can be found on the Innovative Government page of the corporate OECD website.

Machine Learning for Peace

avp_ppf

The hard-fought gains of democratization have come under attack in many countries. To reverse this trend, policymakers and civil society need new tools to navigate sophisticated forms of democratic erosion. We combine recent advances in machine learning with massive webscraping to produce high-frequency data and forecasts predicting where democratic backsliding will occur and the specific forms it will take. We equip pro-democracy forces with advanced warning to guide more strategic responses.

Innovation Summary

Innovation Overview

The third wave of democratization in the 1980s and 1990s saw the adoption of democratic rights and institutions across much of the world. However, this progress has proven delicate in recent years, as politicians, non-state actors, and authoritarian foreign powers have adopted new methods and technologies to increase repression and undermine democratic norms and institutions. For example, recent advances in technology have endowed governments with new repressive tools, including the ability to anticipate the behavior of citizens and civil society and engage in preemptive response. The Machine Learning for Peace Project (MLP) levels the playing field by equipping those working to defend democracy with new tools to navigate increasingly sophisticated attacks on human rights and civil society.

The global trend toward autocratization has brought new attention to the study of political regimes. Policymakers and practitioners now rely heavily on quantitative data to understand changing political conditions and design effective policies and interventions. However, existing data measure these changes annually and release data many months after changes have already transpired. Under the auspices of the United States Agency for International Development (USAID) and the INSPIRES Consortium, DevLab@Penn worked closely with USAID, Internews, the International Center for Not-for-Profit Law, and PartnersGlobal to design an innovative system that fills critical gaps in the information that these public and private institutions use to support democracy around the world.

MLP’s innovation consists of two major advances. First, we provide policymakers and civil society with high-frequency, up-to-date data on the occurrence of 42 types of important political events in more than 40 countries around the world. DevLab spent 4 years building an unprecedented data collection pipeline to generate systematic data on the world’s most politically important events. We use cutting-edge technologies to overcome the challenges that have stymied earlier attempts to leverage the proliferation of online news to track important political events.

We are now tracking events on an unprecedented scale in a way that incorporates locally produced information in each country for the first time and incorporating local partners in the construction of validation of these tools in truly innovative ways. Most importantly, we dramatically accelerate the provision of data to stakeholders by continuously scraping online news published by international, regional, and domestic news sources, using recent advances in Natural Language Processing to measure civic space events in near real-time, and providing publicly available data updated on a quarterly basis for every country and event that we track. To date, MLP has scraped and processed more than 70 million news stories from more than 100 online news sources written in 25 languages.

Second, we combine this massive data repository with high-frequency data on economic conditions and powerful machine learning tools to generate monthly forecasts of future changes in political conditions. Each quarter, we release forecasts predicting how we expect civic space activity broadly, and levels of activity surrounding each of our major political events, to change over the next 6 months. These forecasts are communicated through accessible visualizations meant for consumption by a non-technical audience. Furthermore, we accomplish this using ‘interpretable’ machine learning models that enforce transparency on the model’s decision-making process. For every forecast, we produce visualizations revealing the precise variables that lead a model to predict a future shock. When predicting major events or shocks to civic space, these interpretable models provide a way for practitioners with contextual knowledge to judge how reliable a prediction is likely to be based on the model’s decision-making process.

Since 2021, MLP has been disseminating our forecasts on a quarterly basis to policymakers at USAID and practitioners within the INSPIRES consortium to inform their decision-making around program implementation and development. In January 2022, MLP made this information publicly available for the first time by launching our website and interactive data dashboards at the Open Government Partnership Global Summit. Since that time, we have been working to increase awareness and use of this tool among policymakers, international NGOs, and local organizations involved in INSPIRES Consortium programming. In the coming months, we will expand the MLP project to include more countries and incorporate more diverse sources of news and information in our data production and forecasting. Ultimately, we hope these data can better equip civil society to counter democratic backsliding around the world.

Innovation Description

What Makes Your Project Innovative?

Our project stands out in three ways. First, we dramatically accelerate the provision of high-quality data to practitioners and policymakers who need to take timely, evidence-based action to counter attacks on human rights and democracy. Second, we collect data from local sources of news and information in more than 20 languages. Past efforts collected data only from highly structured, international sources. We use machine learning translation and a data science team to monitor small, local news sources that require highly adaptive scraping methods but provide unprecedented access to local knowledge. We also conduct a quarterly survey of activists in 19 countries to balance our results with local voices. Third, we go beyond merely tracking important historical events by using big data to produce accessible forecasts about future events that can be used by organizations focused on human rights and democracy-promotion to anticipate and plan for changing conditions.

What is the current status of your innovation?

MLP is currently producing quarterly updates to our data and forecasts for 40 countries and making this information publicly available on interactive data dashboards hosted on our project website. These dashboards allow consumers to interact with historical data and compare trends in the occurrence of 42 events across countries and over time. The dashboards also allow consumers to see our forecasts for 15 unique events and inspect the country-specific factors that increase or decrease the probability of changes in event activity. We are actively promoting the use of this information by hosting seminars, tutorials, and public events targeting policymakers and civil society groups. In addition to promoting the existing data products, we are constantly developing new tools and research products and improving the accuracy of our forecasting models. We plan to expand these efforts to include 12 new countries by over the next 12 months.

Innovation Development

Collaborations & Partnerships

The MLP project was developed within the INSPIRES Consortium, a partnership between USAID, academic researchers, and international NGOs with deep ties to local civil society around the world. This creates a feedback loop, allowing policymakers and democracy practitioners to inform the development of research products, research products to inform the strategic decisions around policies and programs, and partner NGOs to disseminate research products to organizations on the ground.

Users, Stakeholders & Beneficiaries

Since 2021, our data and forecasts have been one piece of information used by USAID and INSPIRES Consortium partners to make decisions about the deployment of resources and design of programs. Since January 2022, this information has been publicly available on our website. In our last quarterly survey of activists in 19 countries, more than 80% of respondents indicated that they are likely to this information to inform programming and communicate with stakeholders.

Innovation Reflections

Results, Outcomes & Impacts

Over the past two years, MLP has rapidly scaled our efforts across 40 countries, with funding secured to expand to another 12 countries over the next 12 months. We have also provided advanced warning of several major political events, including major protests in Serbia, mass arrests in El Salvador, and political use of defamation cases in Tunisia. While the impact of these tools on decision-making is hard to quantify, in our last quarterly survey of 70 activists living in 19 countries, more than 80% of respondents indicated that they were likely use MLP forecasts to inform programming decisions and communicate with stakeholders about the future. Our algorithms are able to classify news articles according to the events being reported on with average accuracies above 80% across our 42 categories. Using a measure that summarizes overall civic space activity for each country, our forecasts are currently able to explain an average of more than 70% of monthly variation across countries.

Challenges and Failures

Building the data collection and processing infrastructure necessary to scrape, store, and process vast quantities of text from dozens of online news sources in more than 20 languages presented numerous challenges and was interrupted by number failures. Overcoming these obstacles has required the procurement and maintenance of advanced computation systems, the recruitment of full-time data scientists, and the close involvement of university data security and network specialists. Furthermore, communicating the insights of quantitative forecasts of future conditions and events to non-technical audiences took years of iteration. Producing accessible visualizations required multiple workshops and planning sessions testing different means of summarizing large amounts of information into easily digestible, concise visual representations. Finally, some events that we track have proven too unpredictable to forecast accurately.

Conditions for Success

For MLP’s efforts to be successful, countries must have a sufficiently developed media environment with multiple sources of online news with several years of accessible archives. While most countries in the world meet these criteria, extremely small countries or countries with protracted conflict may not be viable places to implement MLP activities. Furthermore, this project requires close collaboration between researchers with advanced technical skills and policymakers and practitioners with a desire to translate quantitative data into actionable insights.

Replication

Our data production pipeline presents innumerable opportunities for adaptation to new problems. We produce monthly data and forecast 6-months into the future. However, the underlying data can track events at a weekly or even daily level, and new forecasting models could make predictions at different frequencies (ex. weekly) or predict specific changes in events or combinations of events (ex. sudden increases in violence or increased use of legal attacks on opponents). More importantly, our corpus of 70+ million news articles and data processing infrastructure is already being repurposed to track new events of phenomenon, such as polarization or misinformation, within our existing system. Furthermore, these articles contain additional information, such as the specific location where events occur (ex. cities) that can be used to produce even more fine-grained data. We will partner with new organizations and agencies to provide data that helps address their specific missions.

Lessons Learned

Incorporating local voices and knowledge into big data has been much more challenging than we anticipated. While scraping data and information from highly structured, international, English-language sources of online news is relatively simple, expanding these efforts to include domestic news sources requires constant adaptation and advanced technical skills. However, our research demonstrates that these local sources provide vastly more information about their respective countries than is reported-on by international sources. Furthermore, without the novel information provided by local sources, forecasting models are unable to make accurate predictions about the future.

Supporting Videos

Status:

  • Implementation - making the innovation happen

Innovation provided by:

Media:

Date Published:

20 January 2023

Join our community:

It only takes a few minutes to complete the form and share your project.