Data Analytics for efficient healthcare systems, by Ricard Gavaldà, Amalfi Analytics

(On leave from UPC). Received October 19th, 2021.

The context

Europe and the developed world are ageing quickly. About 23% of the Europeans are over 65 today, and estimates are that this percentage will be 40% by 2040. This puts an enormous stress on the pillars of the welfare state, in particular retirement pensions, social services for the elderly, and healthcare. Given that healthcare already uses about 20% of public spending, large increases in funding are unlikely.

The panorama is dismal, but there is one hope: There are huge inefficiencies in the healthcare system, and so opportunities for getting more from the same resources. Avoidable hospitalizations, unnecessary medication and tests, and lack of coordination among healthcare agents are estimated to cost several hundred billion euros yearly in the EU.

The Case for Activity Data Analytics

Technology can be useful for locating and reducing these inefficiencies and, within technology, the full exploitation of the data that the system collects to record its activity.

Amalfi Analytics is a spinoff of the UPC. It originated when the author of this note, a pure academic doing research on Machine Learning, meets medical Dr. Julianna Ribera. Julianna’s 30+ year career includes many management positions in the Catalan healthcare services, in hospitals, primary care, and planning at the territorial level. She felt that people that are in charge of managing healthcare, with a clinical view but also aware of resource constraints, did not have the adequate tools.

Amalfi wants to provide the managers with the data analytic tools that Julianna always wanted to have as a manager, to predict needs, anticipate demands, and allocate resources accordingly.

Discovering insights in clinical healthcare records

Consider the ANIS platform we have developed for analyzing clinical healthcare records. It focuses on improving attention to patients with chronic diseases. It receives the activity data that hospitals use for billing the government or insurance companies. For each visit to the healthcare provider, it contains date, basic patient data and diagnostics and treatments, all in standardized codes.

This information is certainly too basic to do precision medicine or personalized diagnosis. But, one, the hospital is already extracting it, and two, it describes population health status accurately enough to decide on health policies, attention protocols, and planning.

For example, ANIS contains a powerful clustering algorithm derived from Matteo Ruffini’s Ph.D. thesis [1,2]. It can partition the set of patients of a given pathology into clear subgroups with distinct comorbidities, treatments, medication, and clinical results. Then, instead of treating all these patients uniformly with a single clinical guideline, clinicians can decide to adapt the protocols to specific subgroups and route patients differently within the system. We have seen that this invariably leads to both reduced costs for the system and increased patient safety.

Critical to the practical success of this approach are the special features of the clustering algorithm. It is based on tensor decomposition techniques and maximizing the likelihood of the data given the clusters. It tends to find more interesting structure in high-dimensional data as is our case. It is not distracted by irrelevant variables. And it does not require an explicit “similarity» or “distance” function to work; we cannot expect a clinician to define such a function when starting a study.

ANIS contains other features such as algorithms for efficiently mining k-ary associations among diagnostics [3].

We find specially gratifying that the algorithms above all have rigorous mathematical guarantees of performance and work well in practice. They are fast and, most of the time, they provide interesting insights to clinicians that use them.

Predictive tools for managing resources

While ANIS is designed for occasional, in-depth analysis, other platforms developed at Amalfi are designed for daily routine use at various parts of the healthcare system.

For example, Emergency Rooms (ER) are one of the hot spots in any hospital. The APIS platform uses historical activity data (plus other data) to train predictive models of activity. It predicts the influx to the ER (how many people will arrive, and with what kind of problems), how many people will occupy each space of the ER, and how many of these patients will have to be hospitalized, with time horizons ranging from hours to weeks. This lets the manager anticipate peaks of activity and congestion, and rearrange staff and resources before chaos occurs.

A major challenge in these platforms is the robustness and autonomy of the predictive models or, in other words, the fact that the distribution of the data arriving in the system is anything but stationary. Some changes occur gradually (e.g. ageing population), but others occur suddenly (e.g., the hospital changes some protocol, or COVID-19 explodes). The platform needs to be continuously checking for changes, revising parts of the model, or throwing away the existing model altogether and creating a new one. Here the tool has been the algorithmics of machine learning on data streams [4].

In conclusion

There are innumerable problems in healthcare management, large and small, that can benefit from good Machine Learning / data analytic algorithms. Imagination for matching healthcare problems to algorithms is crucial. But if these algorithms are to be deployed in hospitals without continuous supervision by a data scientist, the strengths and limitations of algorithms should be well understood mathematically — something that we feel is still lacking in much of Deep Learning, for example.

Given the highly sensitive nature of healthcare data and the necessary legal restrictions, we are at the moment interested in the practicality of two research areas: Federated Learning, that allows learning from separate sources of data without exchanging them, and Differential Privacy, to do Machine Learning on encrypted, unidentifiable data.

Both are still largely untested in healthcare setting (in fact in any practical setting). We already have a working implementation of a Federated Learning framework.

References

[1] M. Ruffni: Learning Latent Variable Models: Efficient Algorithms and Applications. Ph.D. Thesis, UPC, 2019. Advisor: R. Gavaldà.

[2] M. Ruffni, M. Casanellas, R. Gavaldà. A new method of moments for latent variable models. Machine Learning 107, 8-10 (2018): 1431-1455.

[3] M. Quadrana, A. Bifet, R. Gavaldà. An Efficient Closed Frequent Itemset Miner for the MOA Stream Mining System. AI Communications 28 (2015), 143-158.

[4] A. Bifet, R. Gavaldà, G. Holmes, B. Pfahringer. Machine Learning for Data Streams, with Practical Examples in MOA. MIT Press, 2018.