Accurate prediction of healthcare costs is essential for making decisions, shaping policies, preparing finances, and managing resources effectively, but traditional econometric models fall short in addressing this policy challenge adequately. This paper uses machine learning (ML) to predict healthcare expenditure in systems with heterogeneous regional needs. The Italian NHS is used as a case study, with administrative data spanning the years 1996 to 2019. The empirical analysis implements four ML algorithms (Elastic-Net, Gradient Boosting, Random Forest, and Support Vector Regression) and a multivariate regression as a baseline. Gradient boosting emerges as the superior algorithm in out-of-the-sample prediction performances; even when applied to 2019 data, the models trained up to 2018 demonstrate robust forecasting abilities. Important predictors of expenditure include temporal factors and technological progress, average family size and share of public expenditure over the total, regional area, population and share of foreign residents, GDP per capita and labour activity, and share of elderly population (75 years old and over). The remarkable effectiveness of the model demonstrates that ML can be efficiently employed to predict and then distribute national healthcare funds to areas with heterogeneous needs.

The determinants of health expenditure: a machine learning approach

Caravaggio, Nicola;Lagravinese, Raffaele;Resce, Giuliano
2026-01-01

Abstract

Accurate prediction of healthcare costs is essential for making decisions, shaping policies, preparing finances, and managing resources effectively, but traditional econometric models fall short in addressing this policy challenge adequately. This paper uses machine learning (ML) to predict healthcare expenditure in systems with heterogeneous regional needs. The Italian NHS is used as a case study, with administrative data spanning the years 1996 to 2019. The empirical analysis implements four ML algorithms (Elastic-Net, Gradient Boosting, Random Forest, and Support Vector Regression) and a multivariate regression as a baseline. Gradient boosting emerges as the superior algorithm in out-of-the-sample prediction performances; even when applied to 2019 data, the models trained up to 2018 demonstrate robust forecasting abilities. Important predictors of expenditure include temporal factors and technological progress, average family size and share of public expenditure over the total, regional area, population and share of foreign residents, GDP per capita and labour activity, and share of elderly population (75 years old and over). The remarkable effectiveness of the model demonstrates that ML can be efficiently employed to predict and then distribute national healthcare funds to areas with heterogeneous needs.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11695/156909
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact