House of the rising data
Blog: Capgemini CTO Blog
By Pierre-Adrien Hanania and Iftikhar Ahmed
Public organizations are becoming more and more data-driven. In 2020, 47% of public sector organizations stated that “decision making” in their “organizations is completely data-driven.”
Brace yourself, data-powered organizations are coming
This number will only increase in the coming years as more public services migrate to digital channels to meet citizen expectations. Whether it be AI-judges in Estonia, cancer screening in hospitals, or the detection of bark beetles endangering Swedish forests, AI promises us unprecedented access to new insights. Even though it is exciting to think about the possibilities and opportunities public organizations have thanks to the increasing amount of data, it is equally important to understand their responsibility in using that data. Only if it is used effectively, efficiently, and securely and the insights gathered are trustworthy and ethically processed, will the journey to fully data-driven governments be successful and eventually lead to the use of technologies, including AI.
As of today, only 9% of public organizations claim to have “successfully deployed [AI] use cases in production and continue to scale more throughout multiple business teams.” That’s why data governance needs to be addressed. Data governance includes many different aspects, but it can be broadly understood as the holistic approach to data management throughout an organization, whether it be a central command center for a city authority, a hospital dealing with capacity data, or a welfare agency leveraging case management. A good data governance framework addresses many questions:
- Data availability – Do I have access to the data matching my public service to be delivered?
- Relevance – Do I have the right data for these services?
- Usability – Is the data I collected leverageable?
- Integrity – Is the data non-biased, representative, and complete?
- Security – Is the data safely stored and protected against cyberattacks?
To answer these questions, a strong governance plan, thanks to which the data is collected, stored, and used purposefully and efficiently, is needed. This is done through the introduction of standardized business processes that bring clear policies, procedures, roles, and responsibilities to all aspects related to data management within the organization.
Data on its voyage through life
This quest for data governance impacts the entire data lifecycle. Starting from the discovery and cataloging of data, where data sources are identified and the Data from these sources is centralized and aggregated to build a strong foundation for the coming steps. If we take the example of a bed availability monitoring system, data needs to be consolidated in relation to occupancy, weather, emergency helplines, and health agency reports.
This enables organizations to capture and understand their data better. Given the federated nature of IT-systems in the public sector, activating data effectively and thoroughly is even more important. Through the data-enabling processes, where the data is prepared, data formats are standardized and the source data is enriched so that necessary insights can be gained. In our bed availability example, this means collecting a complete picture of availability across geographies and hospitals in order to get a clear picture of patterns and variables relevant to the situation. During this step, the data is also normalized, and outliers are identified and processed. Regions where bed availability is an issue due to specific occurrences, such as a new COVID-19 wave, or a structural deficit, such as a permanent shortage of resources, could help accurately identify hot spots causing anomalies. The normalization of data and the handling of anomalies ensures that the quality of the data is improved and the data is controlled and managed efficiently. Up to the final step, where real-life advantage is created, allowing for data-driven decision making. In a hospital facing a pandemic, this allows it to best use the gathered data around cases, such as available beds or pandemic patterns, to take the best decisions on resource planning and patient reallocations. It is only then, with a resilient and well-governed data playground guarantying process efficiency, that cutting-edge technologies such as AI become possible to embrace.
AI in the public sector will be governed, or won’t be at all
As in other fields, in public services the need for efficient data governance is key to the adoption of AI. Because the public sector deals with crucial services (sensitive and personal data) and issues (security, justice, health, etc.), the above-mentioned pillars of data trust, security, and ethics are critical when it comes to the implementation of AI and other data-driven projects.
There is no room for breaches or other errors because once lost citizen trust will be very hard to regain. To address these concerns, Capgemini has envisioned the AI & Data Engineering offering. Throughout the data lifecycle (see Graph 1), Capgemini addresses all stages – from the platform foundation to activated data for AI and analytics execution (see Graph 2).
1 Our approach covers the journey from Data to Action through all stages
2 The result – the AI & Data Engineering Platform
Platform sweet platform – Data needs a trusted and resilient home
The customizable AI & Data Engineering Platform addresses the concerns that are specific to the public sector and provides a safe harbor for AI implementation in at least four ways:
- Steering centralized digestion of qualitative and quantitative data
The wildly diverse data sources in the public sector that are used to collect the data and their different data quality standards need to be addressed and managed. To effectively solve a process, for example welfare fraud detection, various risk indicators from existing governmental systems such as taxes, health insurance, residence, education, etc. need to be considered. The raw data is collected and stored by different agencies and ministries at state, municipal, and federal level. Only if the collection and digestion of that data is managed well will trustworthy results be feasible. In the AI & Data Engineering Framework, the effective digestion and processing of different types of data is addressed as part of the “Data Foundation” building block.
- Building a resilient and robust platform
The pandemic saw a spike in cyberattacks against hospitals. In France, for example, the number of such cyberattacks almost quadrupled, underscoring the fact that resilience against such threats is key for the success of any data-driven organization. Such a safety standard can only be ensured if the platform on which the data is processed is secure and offers protection against any kind of cyberattack. The Capgemini Framework addresses and solves these concerns as part of the “Platform Foundation” building block.
- Ensuring the ethical use of data
Ethical AI may not be a concern that is specific to the public sector, although the democratic requirement of crucial public services does require special attention. Data collection and processing through public services must be transparent and comprehensible for citizens, especially when the decision-making process is data-driven. Only then can citizen requirements be satisfied regarding data ethics and explainability. Public sector projects can only be successful if the general public accepts and supports the process outcome, for example, in an intelligent job matching engine.
Data protection is another pillar of this quest for ethical AI. Ensuring the safety and integrity of citizen data is essential in public services, for example, in health agencies where patient data must be safely processed. With data anonymization techniques such as those addressed in the “Data Trust” block of the framework, embracing AI while meeting GDPR requirements is possible.
- Opening the way towards decision-supporting intelligence
In 2020 Capgemini research, 68% of city officials stated that smart city initiatives – which build on a strong digitization of processes – have helped them manage the COVID-19 crisis effectively. Be it for a city, a hospital, or a central government, the pandemic has proven how important it is to have a solid information infrastructure that enables decision makers to take evidence-based administrative action. Only with such an infrastructure and with the help of an efficient data governance will it be possible to ensure successful AI projects in the public sector and make data-driven governments a reality. Building on its ability to master data, a data-driven organization is able to better understand the is-situation, to predict coming occurrences, and to proactively take decisions based on the gathered and analyzed data (see the stages of data-driven governments in Graph 3).
3 The stages of data-driven governments
These benefits can be extended to every part of the public sector – from central command centers for smart territories to seamless smart borders for airport and security authorities; from data-driven hospitals gathering medical data for the good of the patient to end-to-end automated administrative processes in organizations dealing with heavy documentation; from efficient identification and control of real-time threats to more user-friendly and effective public services for the citizens.
Strong data platforms will be data’s rising sun momentum
Data-driven governments are no longer a utopia; they are fast becoming reality. More and more public organizations are using the data they have access to, to improve their own insights and the services that they offer to citizens.
In society 5.0, every part of the public sector has potential that proves the necessity for secure, robust, relevant, trustworthy, and ethical data. But only through an effective digital governance can the data, through its whole lifecycle, be managed and processed in a way that will fulfill all the requirements that public organizations have for it. It is clear that data-driven governments are the future. However, they will only have an impact if mastering data is holistically introduced throughout the organizational structures.