Part IV Project Management System

Description:

Seasonal infectious diseases such as influenza are a reoccurring challenge for healthcare systems worldwide. Besides influenza, several respiratory diseases cause symptoms similar to those caused by influenza, broadly called influenza-like illness or ILI. While most such ILI infections result in mild symptoms that are sometimes not even recognized by the infected person, there are still numerous severe ILI cases every year. The hospitalization burden caused by such severe cases is extremely difficult to foresee as many factors, such as weather, international travel, or immunization in the populations, have an impact on the transmission of respiratory diseases. Models that provide accurate forecasting of ILI cases could be extremely beneficial as they allow for proactive hospital management to be better prepared for higher demands during the winter season. Moreover, policymakers can use forecasting as guidelines to make better decisions about using intervention strategies that could be applied to decrease transmission rates before the health system is pushed beyond its limits. Machine learning can learn patterns from data to provide accurate models in forecasting time series data. Such time series can be derived from infectious disease surveillance systems becoming increasingly available in different countries. One of these systems was established in 2010 in the USA to collect ILI cases within different states providing a dataset covering ILI cases from more than one decade. Furthermore, given that ILI cases are separated by state, the data includes a geospatial component that multivariate algorithms can leverage. However, this data comes at weekly resolution, resulting in only 52 time points per year and a small data set. Small data is challenging for machine learning, and models trained on data with small sample sizes are often unreliable, especially for long-term prediction, as reported in a study exploring the USA-ILI dataset. Transfer learning uses pre-trained models to overcome the small data learning challenge. This concept is implemented within foundation models such as generative pre-trained transformers (GPTs) that got a lot of attention during the last few years as they, trained on an enormous amount of data, have been shown to provide highly accurate results on new tasks. Recently, TimeGPT-1, a GPT for time series forecasting, has been released, which could revolutionize forecasting based on small but highly informative disease surveillance datasets. This project aims to investigate the potential of a GPT model to forecast ILI cases compared to other baseline forecasting algorithms and more sophisticated machine learning concepts covering tree-based and artificial neural network models. This project aims to do this benchmarking while implementing a generic pipeline that can be applied to other datasets, potentially including an ILI surveillance dataset from New Zealand.

Type:

Undergraduate

Outcome:

A comprehensive comparison between results achieved by AutoGluon and TimeGPT on the provided data. A pipeline that automatically preprocesses small data sets and applies comprehensive time series forecasting libraries. Code is ideally written in Python as the AutoGluon and TimeGPT interfaces provide Python packages for integration.

Prerequisites

None

Specialisations

Software Engineering

Supervisor

Steffen Albrecht

Co-supervisor

Gill Dobbie

Team

Alex Kim
Joao Madelino

Lab

Computer Science (303S.499, Lab)

Project #77: Can GPTs revolutionize forecasting the hospitalization burden caused by respiratory diseases?