Skip to content
An official website of the OECD. Find out more
Created by the Public Governance Directorate

This website was created by the OECD Observatory of Public Sector Innovation (OPSI), part of the OECD Public Governance Directorate (GOV).

How to validate authenticity

Validation that this is an official OECD website can be found on the Innovative Government page of the corporate OECD website.

Machine Learning Model for Type 2 Diabetes Mellitus Incidence Prediction for the People Medically treated with EDUS

The innovation was developed because the Costa Rican Social Security Fund seeks to work with strategies that allow the social security system to be sustainable given the limitations of public health and the increase in health spending.
This is the development of a machine learning predictive model to determine the propensity to develop Type II Diabetes Mellitus in patients with EDUS.
It is innovative as it is the first predictive model based on clinical data records stored in EDUS.

Innovation Summary

Innovation Overview

The CCSS has the opportunity to take advantage of the information generated by the Unique Digital Health Record (EDUS) implemented at national level and use it to improve the preventive approach to patients, as well as optimize the use of resources for the sustainability of the social security system.

The innovation consists of the development of a predictive machine learning model that allows the analysis of the behaviour of chronic non-communicable pathologies, such as Type II Diabetes Mellitus (DM2), taking advantage of the clinical data records stored in the EDUS, with a predictive approach so that the Institution can generate more value to the population from prevention and health promotion, improve the quality of life of patients suffering from chronic non-communicable diseases and decrease related costs, in order to offer timely services with greater efficiency and quality in public health.

The objective was to develop a predictive model that would allow us to determine in the first instance the probability of suffering or not from DM2 in the population, taking as a reference the data contained in the EDUS.

The CCSS benefits from innovation as it takes advantage of the information contained in the EDUS to predict the incidence of DM2 in the undiagnosed population and offer health services with timely prevention and treatment of chronic non-communicable diseases, starting with DM2 favouring the sustainability of Costa Rica's public health system.
It also benefits people at medium and high risk of developing the disease as they will be alerted early to take actions that decrease the risk and, ideally, prevent the disease.
A third beneficiary is the Ministry of Health of Costa Rica, as it will have predictive information grouped by region, gender and age groups that will allow it to carry out strategic actions aimed at stakeholders.
The innovation is projected into the future integrated with the EDUS so that the doctor receives a risk alert and a suggested treatment plan for these patients at risk depending on whether it is low, medium or high. The way to scale it is through reports grouped by region that allow health campaigns in areas with a higher concentration of people at high risk.

The creation of this model had the following phases:
1. Examine institutional databases in order to identify those variables available for the implementation of the predictive model.
2. Analyze data quality.
3. Prepare the data necessary for the creation of the Machine Learning model.
4. Model the data through the development of the Machine Learning model.
5. Evaluate and select the model that offers the most accuracy in predicting the disease.
6. Implement and automate the Machine Learning model for predicting DM2 in the EDUS and create monitoring and maintenance strategies.

Innovation Description

What Makes Your Project Innovative?

- This is the first time that a machine learning predictive model has been developed based on data from the population served with EDUS in Costa Rica (approximately 5 million people)

- Required organizational change to understand needs and define secure access to information and the massive use of cloud storage and development services.

- Involved a multidisciplinary team of various business units that contributed their expertise in each area of work.

What is the current status of your innovation?

Currently, the model has completed the phases of the CRISP-DM methodology for its development and is ready for the Start-Up phase.
Once the visualization tools (information extraction) are available, it will be possible to clinically evaluate the validity of the model against the real results that are obtained from patients identified as high and medium risk, for example.
Technical guidelines are also being developed for clinicians so that they know the correct approach to the patient in the face of a predictive alert generated with artificial ingeliligence.

Innovation Development

Collaborations & Partnerships

The main participation was the CCSS project team, made up of engineers, statisticians and doctors from different levels who contributed their knowledge in the decision-making process for the construction of the model.
A private company was involved as a Machine Learning Professional Services provider: they have the responsibility for conduce the team across the CRISP-DM phases and ask for information and clinical validation of the results for the development of the algorithm.

Users, Stakeholders & Beneficiaries

Citizens: by identifying individual risk, it will be possible to carry out a preventive approach, education and early treatment plans for citizens who are at greater risk of developing the disease.
Government officials: The Ministry of Health, through a dashboard with information grouped by region, province, canton, district, age group and gender, will be able to validate the distribution of risk according to the criteria applied and focus public policy on reducing risk in the most critical area

Innovation Reflections

Results, Outcomes & Impacts

This innovation allows to:
Provide the necessary components to carry out analyses on the volume of data recorded in the EDUS and other valid sources, to determine patterns and trends that relate the characteristics of population segments with the incidence of DM2.
Implement visual analysis tools that help determine and measure specific strategies and objectives.
Provide medical staff with access to specific information on cases at risk, so that they can work with the patient from the consultation on strategies that help reduce the chances of contracting the disease.
Provide physicians with information on each patient with propensity for indicators or alerts to work on the prevention of situations that cause a health deterioration.

Challenges and Failures

Access to relevant and diverse data is a primary hurdle. Incomplete or biased datasets can lead to model inaccuracies. Additionally, ensuring data privacy and compliance with regulatory standards poses a persistent challenge.
Model interpretability is crucial in a healthcare context, and creating a balance between accuracy and interpretability is often challenging.
The dynamic nature of health data introduces the risk of model obsolescence. Changes in medical guidelines, evolving diagnostic criteria, or new research findings may necessitate frequent model updates to maintain relevance and accuracy. Moreover, deployment challenges, such as integrating the model into existing healthcare systems, may hinder widespread adoption.

Conditions for Success

The success of a machine learning project in predicting diabetes risk relies on a holistic approach that encompasses data quality, model robustness, interpretability, regulatory compliance, and effective integration into the healthcare ecosystem. Addressing these challenges requires interdisciplinary collaboration, continuous monitoring, and a commitment to refining both data and models over time.


Innovative in predicting type II diabetes risk through machine learning, this model has broad replication potential. Organizations, from healthcare providers to governments, can adopt and adapt it for preventive care. Already attracting interest, it optimizes resource allocation within my organization. Its applicability spans diverse agencies, enabling targeted interventions at a population level. Collaboration fosters model robustness, making it valuable for smaller organizations. Ongoing refinement ensures adaptability to evolving data, establishing it as a pivotal tool in global preventive healthcare efforts.

Lessons Learned

The importance of having complete and representative data is fundamental for the project's success.
Collaboration among data experts, healthcare professionals, and domain specialists is crucial for developing relevant and applicable models.
In healthcare settings, a clear understanding of how the model makes decisions is crucial for its acceptance and adoption by medical professionals.
The ability to update the model in response to changes in medical guidelines and new knowledge is essential for its relevance over time.
Rigorous adherence to privacy standards and medical regulations is necessary to ensure trust
Scalability and Accessibility
Inter-organizational Collaboration

Year: 2023
Level of Government: National/Federal government


  • Evaluation - understanding whether the innovative initiative has delivered what was needed

Innovation provided by:


Date Published:

2 July 2024

Join our community:

It only takes a few minutes to complete the form and share your project.