WP 1 - University: TUDĀ
Faculty: Electrical Engineering, Mathematics and Computer Science
Integrated data science with focus on health
In WP1, we are developing a mathematical model that estimates the risk of complications after colorectal surgery. Since healthcare dataset may consists of complex dependencies between variables, we use copula-based model. Specifically, we use vine-copula for its flexibility to describe complex dependencies and its relatively computationally cheap for doing inferences. Before we apply the model, we investigate whether we use all of dataset, or more recent ones. This is in relation with the possibility of dataset shifts that may happen throughout the year. Lastly, we would like the model to be able to update itself continuously as new data becomes available.
Where has the WP1 developed over the past 2 years
Over the past two years, our team has been working on patient risk prediction in healthcare. We have fine-tuned predictive models across various domains, including colon cancer, weight-loss surgery, and bowel surgery, to improve accuracy and clarity in forecasting patient outcomes. Our efforts have identified which models best balance performance with ease of interpretation.
A key part of our work has been the rigorous selection of relevant risk factors. We can focus on the crucial elements that influence postoperative outcomes by filtering out unnecessary information. A first example of our progress is showcased in Gidius van de Kampās thesis, āPatient Level Predictions in Bowel Surgery: Comparing Variable Selections for Rare Outcome Modeling on Real Surgery Data.ā His work explores statistical techniques to improve prediction accuracy, specifically for rare postoperative outcomes, using real data from our collaborator, Medisch Spectrum Twente (MST).
By working with MST data, we have also dedicated considerable effort to data cleaning and processing. This includes handling missing values through imputation strategies, ensuring data consistency across sources, removing noise and outliers, and standardizing data formats. These steps are essential to transform raw, real-world data into a reliable statistical analysis and modeling basis.
Another standout area in our work has been using vine copula models in patient risk profiling. These models are excellent for uncovering complex, non-linear relationships in healthcare data, relationships that traditional methods might overlook. With vine copulas, we aim to capture how different health indicators interact, providing deeper insights into patient risk.
Our current challenge is integrating diverse data types, from continuous measures to ordinal categories, into our predictive models. Ultimately, we aim to enhance predictive accuracy and make data-driven policy recommendations using these advanced methodologies. We can support more targeted and effective healthcare strategies by distinguishing high-risk patients from those with low or no risk.
What was the status before the WP1?
Prior to the start of RECENTRE, the team members were highly motivated by practical problems and engaged in multi-disciplinary projects (or internship) and driven to develop tailored statistical methods to applied settings. The team members were also building solid theoretical and methodological background relevant for the RECENTRE work. All three members had affinity, but were not limited to, health-related applications.
Current status
Currently, we have a PhD student, Victor Ryan, who has been on this project since October last year. At the start of the project, he has been studying the theory of copulas and vine-copulas. Furthermore, since the challenge is to integrate various data types into the model, he explored the limitations and implications of including discrete variables in copula-based models. The next step will involve detecting dataset shift that may occur throughout the year. We will begin by studying the types of dataset shifts and their implications for the dependence between the variables under each type of shift.
Collaboration
Our collaborative efforts with partners such as MST, Ziekenhuisgroep Twente (ZGT), and the Dutch ColoRectal Audit (DCRA) have been transformative for our research. By combining our strengths, clinical data, expertise, and insights, we have enriched our modeling efforts and improved the validity of our predictive systems.