Calibrating Reliance on Decision Support Systems

Guest author: Simon W.S. Fischer

We make several decisions every day. Recommender systems or decision-support systems increasingly inform and enrich our decision-making process in various areas, from shopping, to finance and law.

In healthcare, clinical decision-support systems (CDSS) aid the decision-maker, here the physician, in making a decision for the patient. CDSS can be used for diagnosis (e.g., skin cancer detection), triage and resource allocation, or for treatment recommendations. In each case, CDSS take data as input and provide a relevant output.

Based on patient information and a specific diagnosis, the system suggests different treatment options and their probable effectiveness. For example, it might suggest that surgery has an 80% probability of success and physiological therapy 15%. The physician can then make a decision based on this prognosis from the CDSS and other case-relevant information. Overall, the consequences of these decisions can be life-changing.

Over-reliance and European Regulations

Different studies show that physicians tend to rely too much on CDSS. In radiology, for example, two radiologists assess a mammography for cancer. In a study, one radiologist was replaced by a decision-support system, which was set up to provide wrong recommendations from time to time. The results show that the overall decision-performance decreased by almost half during human-machine collaboration because the human did not recognise the wrong recommendations (Dratsch et al., 2023). Similarly, another study shows that oncologists preferred the machine's suggestion, even if they were of different opinion, and even in cases where the machine recommendations were wrong (Tschandl, 2020).

Physicians have a professional responsibility to promote the health of the patient, and over-reliance on CDSS can have severe consequences. The European AI Act states in Article 14 on human oversight that people should "remain aware of the possible tendency of automatically relying or over-relying on the output". Similarly, the European High-Level Expert Group for Trustworthy AI defines human agency and oversight as the first of seven key requirements: "Users should be able to make informed autonomous decisions regarding AI systems."

In light of these regulations and studies indicating a tendency to over-rely on CDSS, the question arises as to how we can ensure an appropriate and calibrated reliance on decision-support systems. We want to utilise the additional information provided by the CDSS instead of simply dismissing the system, but we want to be critical of the machine recommendations instead of blindly following them.

Explanaitons Increasing Reliance

One common approach to solve this problem is explainable AI. The idea is to make the workings and computations of the system more transparent to the decision-maker. Instead of merely providing a suggestion, e.g., flu 80%, an explanation is provided as to which factors contributed to that prediction, e.g., sneeze and headache. The assumption is that this increased transparency and the additional information allows the decision-maker, i.e., physician, to make a more informed decision.

Different studies show, however, that explanations do not sufficiently address the problem of over-reliance on machine recommendations. Again in an experimental setup, a decision-support system provided wrong suggestions paired with explanations. The explanations, however, did not help the physician to recognise the wrong recommendation and make the correct decision, i.e., against the CDSS. Instead, treatment selection accuracy decreased (Jacobs et al., 2021). As another study emphasises, explanations increase the likelihood that people will accept the machine recommendation, regardless of its correctness (Bansal et al., 2021).

To overcome these shortcomings of explanations, researchers explore different types of decision-support systems and forms of interactions with them. One approach, for example, experiments with when to show explanations to the person: delayed, to provide some time for the person to make up their own mind; on demand; and only once the person has made their decision. While these interventions can reduce over-reliance, the most effective interventions were also the least favoured ones (Bucinca et al., 2021). Another system shows evidence for and against the decision, instead of merely presenting the most likely outcome (Miller, 2019). Although having evidence both for and against did not increase decision-accuracy, the physicians liked the reflective aspect (Cabitza et al., 2023). Yet another approach frames explanations as questions, which helped to increase human judgement (Danry et al., 2023).

Deliberation through Questions

Building on prior work to reduce over-reliance on machine recommendations and stimulate critical reflection, we want to present a question-asking machine (Fischer, 2024; Haselager et al., 2023). The clinical decision-support system remains in place, and an additional system or component of the CDSS raises meaningful questions to the decision-maker. These questions, ideally, allow the physician to reflect on input data, scrutinise the workings of the system, and put the output into perspective. Compared to explanations, the assumption is that questions are more engaging and actionable. Questions promote the physician's proactive care by stimulating deliberation. Similar to the Socratic approach, questions affect self-efficacy and decision-autonomy.

Relevant questions can be about the causal dependency of input and output, e.g., how does outcome Y follow from data point x. Next, instead of presenting only the factors that contributed to a prediction, as is the case with explainable AI, an additional step can be added to question the relevance of these factors. For instance, one might ask if a headache is the relevant factor to focus on? Another common approach in explainable AI are counterfactuals. These could also be posed as questions, such as would you recommend the same treatment if the patient were two years older. In doing so, built-in thresholds in the CDSS can be questioned. It is also possible to ask about data points that are not part of the data set and thus decision-support system.

It remains to be seen how often the decision-maker can be asked without compromising efficiency too much and without becoming a nuisance. Further, different questions can be tailored to the expertise of different decision-makers. A novice, for example, might benefit more from questions about causal dependency, whereas an experienced physician might benefit from questions that promote their decision-autonomy. It is also conceivable that the CDSS processes the answers of the physician and takes them into account for an updated and sequential recommendation.

The overall aim of posing questions is to raise doubt, where necessary, on machine recommendations. The question-posing machine encourages the physician to deliberate, while taking context-specific information into account. We hypothesise that this forward-looking approach of question-asking calibrates reliance on decision-support systems.

References

Bansal, G., Wu, T., Zhou, J., Fok, R., Nushi, B., Kamar, E., Ribeiro, M. T., & Weld, D. (2021). Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–16. https://doi.org/10.1145/3411764.3445717

Dratsch, T., Chen, X., Rezazade Mehrizi, M., Kloeckner, R., Mähringer-Kunz, A., Püsken, M., Baeßler, B., Sauer, S., Maintz, D., & Pinto Dos Santos, D. (2023). Automation Bias in Mammography: The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance. Radiology, 307(4), e222176. https://doi.org/10.1148/radiol.222176

Fischer, S. W. S. (2024). Questioning AI: Promoting Decision-Making Autonomy Through Reflection (arXiv:2409.10250). arXiv. http://arxiv.org/abs/2409.10250

Haselager, P., Schraffenberger, H., Thill, S., Fischer, S., Lanillos, P., Van De Groes, S., & Van Hooff, M. (2023). Reflection Machines: Supporting Effective Human Oversight Over Medical Decision Support Systems. Cambridge Quarterly of Healthcare Ethics, 1–10. https://doi.org/10.1017/S0963180122000718

Jacobs, M., Pradier, M. F., McCoy, T. H., Perlis, R. H., Doshi-Velez, F., & Gajos, K. Z. (2021). How machine-learning recommendations influence clinician treatment selections: The example of antidepressant selection. Translational Psychiatry, 11(1), 108. https://doi.org/10.1038/s41398-021-01224-x

Tschandl, P., Rinner, C., Apalla, Z., Argenziano, G., Codella, N., Halpern, A., Janda, M., Lallas, A., Longo, C., Malvehy, J., Paoli, J., Puig, S., Rosendahl, C., Soyer, H. P., Zalaudek, I., & Kittler, H. (2020). Human–computer collaboration for skin cancer recognition. Nature Medicine, 26(8), 1229–1234. https://doi.org/10.1038/s41591-020-0942-0

About the author

Simon W.S. Fischer is a PhD candidate at the Donders Institute for Brain, Cognition and Behaviour focusing on the societal implications of AI, in particular on AI-based Decision Support Systems used in healthcare. Website: https://www.ru.nl/en/people/fischer-s