Good Decisions Webinar > Meaningful Human-in-the-Loop

Meaningful
Human-in-the-Loop:
How Humans and AI Interface Best in Healthcare

AI can sharpen clinical practice—or quietly erode it. In this 30-minute session, MD Anderson’s Shawn Stapleton explains what “meaningful human-in-the-loop” really looks like in healthcare, how to avoid de-skilling and “never-skilling,” and the skills health systems need to keep humans and AI working safely together at scale.

As AI moves deeper into clinical workflows, the question isn’t if we use it—it’s how and when. Introduced too early, AI risks “never-skilling” new clinicians who don’t have a chance to build core competencies. Introduced too late, health systems miss opportunities to reduce burnout, improve access, and support better decisions at the point of care.

“Human-in-the-loop” is often used as shorthand for safety, but it only works when humans are truly capable of providing oversight. That means clear expectations, the right skills, and governance structures that ensure humans and AI are complementing each other—not quietly amplifying each other’s weaknesses.

Is your organization ready to move from token oversight to meaningful human-in-the-loop?

Join Shawn Stapleton, PhD, Director of AI Lifecycle Management at MD Anderson Cancer Center, for a 30-minute session on building human–AI workflows that protect clinical skills, maintain trust, and scale responsibly across your AI portfolio.

🔍 What You’ll Learn:

  • How health systems can time AI introduction to avoid “de-skilling” and “never-skilling”
  • The three core skill sets that make a capable human-in-the-loop in clinical environments
  • Practical examples of humans and AI working together in documentation, triage, and imaging
  • Common failure modes—over-reliance, automation bias, and hidden uncertainty—and how to mitigate them
  • How lifecycle management and governance can turn human-in-the-loop from a buzzword into an auditable, sustainable practice

Register for the series.

‍

Transcript

1. Introduction to Human in the Loop in Healthcare

Jay Combs: Welcome to the last Good Decisions webinar of 2025. We have a very special guest today who's going to discuss meaningful human-in-the-loop—how humans and AI interface best in health care.

There are several important topics we're going to dive into. We're going to start with a common definition of human-in-the-loop and why that's important in health care. We're going to talk about the capabilities and vulnerabilities of both people and AI, and we're going to talk about the skills that people need to develop and how to avoid de-skilling to effectively work with AI.

We'll dive into some real-world examples of AI and people working well together, and then we'll close out with the importance of AI life cycle management and governance.

2. Introducing the Expert: Shawn Stapleton

So this event is being recorded, and we will share the presentation via email afterwards. We encourage you to bring up important ideas in the chat via the webinar feature.

So without further ado, I'm thrilled to welcome today's special guest, Shawn Stapleton, PhD, director of AI lifecycle management at MD Anderson Cancer Center. Shawn is a nationally recognized leader in bringing AI safely and meaningfully into clinical practice, from pioneering AI in radiology and oncology to shaping how humans and AI collaborate in high-stakes health care environments.

We're excited to have him here and to dive into what makes a meaningful human in the loop. Please give him a warm welcome, and over to you, Shawn.

3. Understanding Human in the Loop

Shawn Stapleton: Thanks so much, Jay. It's a real pleasure to be here. Thanks for inviting me. And it's a really great topic, right — human in the loop.

We hear this more and more popping up in all aspects and elements of artificial intelligence, and we've been asking a lot: what does that mean in health care? Traditionally, when we thought about human in the loop, we were really thinking about how we leverage humans to align the AI algorithms that we train. That might be labeling data, reviewing outputs as part of evaluation, looking at biases in algorithm outputs, and so forth.

But we're starting to realize more and more, as we put these algorithms in real-world situations, that we don't fully understand the implications of how humans and AI have to work together to get a meaningful outcome or a meaningful impact for your organization. So this has been a really interesting, evolving area for us. It's not necessarily new.

We've been putting humans and technology together for decades, especially in health care. What we've learned from that is: expect the unexpected. There are lots of unintended consequences that come when we put humans and technology together without really helping our humans understand what the expectations are of them when they're using that technology.

And as I'll show you today, we're really starting to think through what that means for artificial intelligence.

4. AI’s Role in High-Stakes Healthcare Decisions

AI in health care is used in many different areas, and we're very focused on areas where we're making high-stakes, timely decisions. These are places where we're trying to perform diagnoses and/or treatment plans — not only treatment planning, but actually treatment applications — and AI is becoming increasingly embedded in that workflow.

What we've realized now that we've put AI into these workflows is that an amount of uncertainty gets introduced, whether it's workflow uncertainty or decision support uncertainty. Largely, that comes from the fact that AI outputs often are nondeterministic, but we present them in ways that appear deterministic to our users. We know that they behave very differently across our patients and across medical imaging scanners, for example, when we generate CTs. There are many different ways that we generate CTs, and AI has to perform well across all of those. It's not necessarily the case that it does.

We also build algorithms for generality, but we use them on individual patients, across individual institutions, in individual clinical contexts. They don't necessarily generalize that well. So we really are relying on humans to interpret the outputs, given some of those uncertainties, to make sure the decisions being made are clinically appropriate and aligned with the outcomes we're trying to achieve for our patients.

5. The Evolving Role of Humans in AI Integration

We know as AI capabilities evolve that the human role is really shifting. We don't really want humans to be passive validators of algorithms, just providing feedback like labeling data.

Really, they are central for ensuring that AI is augmenting our clinical reasoning and clinical workflow capabilities, not sitting there as an unchecked dependency for us.

So what we've been starting to dissect is: what are the considerations when we have humans and AI working together? Some might be familiar with this high-reliability-organization theory or framework that's largely based on something called the Reason Swiss cheese model, which is dissecting different components of vulnerabilities that could potentially lead to errors and harms when working in a complex, high-risk scenario. We talk about the Swiss cheese: the holes represent vulnerabilities — vulnerabilities for AI or technology in general, and vulnerabilities of the human.

There are many different types of vulnerabilities that are distinct between AI and humans. Then we're asking humans not only to overcome their own vulnerabilities, but also to check and balance the vulnerabilities that AI has, unique to itself. We often hypothesize or reason that there's potential for latent errors or harms that might propagate. Imagine an arrow representing an error or harm propagating through these Swiss cheese holes, through the AI and through the human, resulting in some sort of harm to the patient or some sort of medical error that might have occurred.

So what we're really starting to ask is: how do we align humans and AI together? How do we train them so that, synergistically, they work to ensure the value of AI is realized, but the harms and errors don't propagate through? We often say we're trying to limit the emergence of latent harms or the amplification of latent risks.

These are uncertainties, errors, harms that might have existed in the background that, when you're a human doing the work, don't tend to emerge, or they emerge at a very low frequency. As soon as you put AI in that spot, those Swiss cheese holes grow or they shift, or new ones pop up, leading to the emergence or realization of that harm. And what we're really starting to ask now are very simple questions of the human:

Are humans overreliant on AI? Are they simply choosing what the AI says and moving forward with that? Is there a false sense of safety with using the technology? Do we feel we're overreliant on it?

Are we de-skilling our own human natural subject matter expertise — or, let's say, trained subject matter expertise — in a particular domain? Meaning, are we redefining what a radiologist's subject domain expertise really is because artificial intelligence is coming in and taking and helping support a proportion of the work that they do?

And, of course, in health care, we are really focused on auditability and accountability. Who is responsible for that final decision-making? And what does that mean in terms of relying on artificial intelligence to support a decision or support a workflow?

Ultimately, do any vulnerabilities lead to a loss of trust? There is value and there are benefits to using this technology. We want to realize those benefits; we don't want to simply ignore it because we feel the risks are too great.

6. Case Study: AI in Sepsis Detection

So there are many examples of this in health care, and this is probably one of the most studied examples that we've seen — really around the detection of sepsis within patients.

Now, sepsis is a high-risk, time-critical clinical decision that has to be made. It is a life-or-death decision for patients that has to be made within hours. Many AI algorithms have been developed to help support the clinician in making that clinical diagnosis of sepsis so appropriate actions and treatments can be taken very rapidly. What we've discovered here, as part of this, though, is that there was a lot of friction between humans and artificial intelligence when these tools or this AI technology first came out.

One of the big issues was that AI outputs were nondeterministic, and they behave very differently across populations. Clinicians and nurses were often asked to treat these probabilistic scores as deterministic. You get an alert; you're not necessarily acting on the fact that there's a ninety percent probability this patient has sepsis — it's just “above a threshold we need to action.” But when you looked at the statistics for those algorithms, they were actually very poor, only slightly better than a flip of a coin at determining if a patient actually had sepsis, and they tended to overcall, with a large false positive rate.

Because those models were embedded within the EHR, they often were perceived to carry authority. In fact, there were workflows and policies put around the technology that said there needed to be escalation pathways to override the output that was generated.

So the sepsis algorithm predicted the patient would be septic. There was a process for nurses and clinicians to go through to use their clinical judgment. Right? Even if their clinical judgment disagreed, they would have to go through this process, which led to a lot of friction and made it difficult to override, even though clinical judgment might have been much better in these scenarios.

And then, very importantly, there was a timing mismatch. These predictions come at different times, and they're not necessarily at the right time when the clinical decision needs to be made. And sometimes, especially in the case of sepsis, the sepsis warning would come much later than the clinical diagnosis. That was very interesting.

So the clinicians would be able to find, predict, or present a diagnosis much earlier and trigger treatment much earlier, but then would have to sort of put things on hold because they had to wait for the sepsis algorithm to catch up to that clinical diagnosis.

It seems strange, but these are the kinds of interesting patterns that emerge in these use cases.

What we've realized from that and many other studies is we really need humans and AI to align together for better outcomes. And that really means we need more human factors. We need to define what a capable human looks like when working with AI so that we can get meaningful impact and safe impact by using these technologies.

Ultimately, the goal is that we want to strengthen our human skills and empower our care teams and clinicians so that we turn AI insights into much stronger decision-making and improved patient outcomes.

7. Defining Responsible AI in Clinical Settings

So, like I mentioned early on in this presentation, we've spent a lot of time looking at what responsible AI means, focusing on the technology and the algorithms — ensuring that there's some transparency, investigating bias, trying to mitigate uncertainty in the predictions, improving consistency, and describing failure modes for when you should and shouldn't use this technology. And that is very important. We continue to work on those things. But what we're now realizing is we have to start defining our expectations of our clinical teams when using this technology.

And that comes down to a variety of factors. Clinicians and our care teams need clear guidance on the workflows and how those workflows have changed now that AI is there to help support them. Who has the decision authority? How do we create proper trust and override models — overrides for these models?

How do we teach our clinical teams to apply contextual reasoning to the AI outputs in a way that really puts them in the context of what's needed for the patient and the expert clinical judgment that our clinical teams already have?

And building these expectations feels very important for empowering our clinicians today. I'll show you in a few more slides: we actually think this is quite important to prevent de-skilling, misskilling, and never skilling as we start to use AI technology more and more, as its capability continues to increase, and as we start to move it into many more domains where humans traditionally have been the drivers.

8. Accountability in AI-Driven Healthcare

Jay Combs: Shawn, real quick — I mean, Michael raises an interesting question. I know it's a sarcastic question in the chat. But going back to that sepsis case, which I know is, you know, several years old, and I think it was at the University of Michigan — why would somebody embed an algorithm like that, or a model like that, in such a critical workflow? How did that get through? Is it because it was, I guess, maybe an early adoption of AI and people didn't understand the consequences? Or what do you think could have been some of the thinking of the folks who were leveraging that and putting it into production?

Shawn Stapleton: It's several-fold. So in those scenarios, we have clinical indicators that have—we've evolved, you know, let's say, artificial intelligence over time as just a general concept. There have been early-day expert-system-type technologies that we've used. We've developed heuristic rules, essentially, and we found that those work so-so, right? Slightly better than chance, but still not efficiently.

And so, as machine learning became prominent and then deep learning became prominent, it's a natural evolution to understand: can we detect sepsis earlier or more efficiently with the information we have at a particular time point? So AI provides some sort of promise that we can do that, or at least a bit of hope that we can do that. And then we tend to test these out.

And in the case of sepsis very specifically — so that's just in general, right? In the case of sepsis very specifically, it turned out we didn't do the evaluation very well. So there was, in a limited controlled setting, a demonstration that this technology worked very, very well. Okay? Much better than chance, and we could control that we would get the predictions at the right time. We would order all the appropriate lab tests so all the information was there for the predictions.

When we started to deploy that to other health systems, it turned out that the generalizability simply wasn't there, for a variety of reasons: algorithmic-related, workflow-related, and staff training/skill-set-related. So the natural evolution is why we did it, right? And then why it ended up failing is, I think, due to very common causes that we understand.

Jay Combs: Yeah, and then another question — and this one will go right into your next slide. This is from Naveen: how should we prevent human in the loop from becoming a rubber stamp? How do we make sure clinicians stay meaningfully engaged? What metrics can we rely on to confirm that humans are actually catching AI failure modes? So go to the next slide on that, because I know there's lots to talk about there.

Shawn Stapleton: Yeah. And this is really an emerging area that we have to figure out, and that we’ll talk about here. There are three primary skill sets that we think about now when we're trying to educate our clinical teams on using AI.

When I say “we,” it's our organization as a whole. It's not just data scientists and engineers like myself; it really is a broad group of stakeholders that are part of this.

What we're trying to grow is universal skill sets around AI literacy — really making sure that when we bring in a solution, whether we purchased it or we built it internally, we provide solution-specific training. So, if we bring in a sepsis algorithm, we need to train very clearly on when and how to use that technology, but also ensure the subject matter experts who are there as a human in the loop — interpreting and contextually using that output of AI — maintain their domain expertise, like their clinical expertise as a radiologist or a pathologist or a radiation oncologist and so forth.

Those three core elements are really, really important to us. And there's a collection of concepts that we want to really drive home that you see on the slide, but this is by no means an exhaustive list of concepts.

Now, how to get that education to our different teams is really an important question. We don't have a slide on this, but what we're starting to leverage is existing frameworks to build universal skill sets and domain-specific skill sets for clinical teams that work with technology. And right now, what we're trying to do is take those frameworks and update them with AI-specific considerations within them.

Jay Combs: Yeah. Just another question that has popped up, I think, is very, very important. It's around accountability and who is ultimately accountable if there is an adverse outcome, especially when there is a human in the loop who's under time pressure. So how do you think about accountability when humans and AI are working together?

Shawn Stapleton: So, very practically, in the health care space, it's the clinician who's accountable for the decision that's being made. That's why we have a human in the loop in these clinical settings to begin with, right? They've always been accountable for decisions being made, whether technology is used in the background or not.

Will that evolve? Probably. There will probably be an evolution of accountability that may go toward the vendor — the developer of these algorithms. And certainly, from an institutional standpoint, there is accountability from a reputation perspective. We've seen many examples now of what we might call the New York Times test, right? Having leveraged these algorithms and run into what we thought were small problems in health care that turned out to be much larger and were uncovered through media outlets that are looking quite closely at what we're doing in health care overall and how we're leveraging new technologies like AI.

So there is accountability across all of these different areas. Right now, from a very clear legal standpoint, the accountability sits with the clinical decision-makers at the moment.

Jay Combs: And that's, I think, a great transition to de-skilling, misskilling, and never skilling.

9. The Risks of De-skilling in Healthcare

Shawn Stapleton: And it's a very interesting thing. So when we look at this—I'll take you through this plot a little bit, because it probably appears a bit complicated off the get-go. You're really looking at time in terms of history, okay? A long time frame over that time dimension. And then we're looking at a representation of largely skill set or capability along that y-axis.

And we've seen this, you know, time and time again. Early on in AI, we had lagging adopters of artificial intelligence. There was a lack of trust, and clinicians didn't quite realize that they are accountable for the decisions being made. If they don't feel comfortable trusting a particular technology like artificial intelligence, they will not choose to use it. Right?

That is changing over time. AI is now becoming more pervasive, and we've seen this kind of pivot point as generative AI has come out. Why? Because we're now using AI in our day-to-day lives. We're starting to feel comfortable with artificial intelligence. We're starting to understand its capabilities and its limitations in our day-to-day lives, and that translates to what we do at work. It's just a natural translation. Right?

And what we've now realized is: okay, folks that have gone through their medical training cycle — they are experts, or some level of experts. They benefit from this technology from a productivity enhancement, but not necessarily a capability enhancement.

But then you're potentially at risk if you are an expert and you continually use these tools, increasingly, for higher-cognitive decision-support uses. There’s an ability for you to de-skill. And we've seen that now in clinical trials. We've seen it in the case of diagnosing adenocarcinomas as part of GI imaging — gastrointestinal imaging — and that just usage, over two or three months, of technology like AI to support detecting and diagnosing those adenocarcinomas or polyps ends up de-skilling the clinician.

Meaning, after two months, you put the clinician back on trying to detect them manually; their performance has decreased.

So that's quite an interesting issue. That probably will get exacerbated as more trainees come in with the technology underhand — meaning that they're training with the technology, and they're not necessarily building that expert-level skill set.

At least our hypothesis is: if we don't adjust our training, they might not ever get there, which is very important. We actually do want our folks to maintain their skill sets and their expertise in these clinical domains. It might shift a bit, but we, by and large, would love to maintain experts and have AI contributing to driving that expertise.

But we might end up in these scenarios of sort of “no skilling” or “never skilling,” right? Which is: the AI is getting so good at doing particular tasks that it might not make sense to ever train humans to do those tasks.

We are starting to see some of that today with diabetic retinopathy screening. These are medical images of your eyes that look at vasculature in the back of your eye to determine if you have diabetic retinopathy. AI outperforms clinicians in doing that task, and it's actually much simpler to do that test in everyday settings than it would be to get a referral to a specialist clinic to do that type of imaging.

So that is now a closed-loop system to perform automated diagnostics for diabetic retinopathy, which is quite interesting — the first example that we've seen where we have built autonomous AI. And we are no longer—or I would say, we are still training physicians to detect that, but over time, there's potentially a very large shift there in how we train folks.

Jay Combs: Yeah. And, Shawn, one of the skills you talked about earlier was, you know, bias detection. A question has come up of, you know, what is a bias pattern? Is it based on behavioral decision theory or patterns in the dataset or both? Or, I guess, what are some of the bias detection skills you're trying to implement or teach?

Shawn Stapleton: It is both, right? So it's a general understanding that artificial intelligence will learn bias from data. Okay? And it's always unclear how it learns that bias until you investigate those pieces.

Then there is the usage of that technology in the clinical setting, and the bias can simply come from the usage. It is being applied inappropriately to patients — meaning there's a scope of usage, and we're using it outside of that scope. Right? So that has a potential impact or bias on the outputs that we're generating.

And then, thirdly, it's the interpretation. It's that human bias in interpreting those outputs. For narrow predictive models, binary classifiers, maybe it likely exists but might not be so severe. But as we get into the generative AI age, we are uncertain what that looks like in terms of human bias in relation to the bias in outputs that are generated from things like large language models.

Jay Combs: Got it. And we’ve only got a few minutes left, so I wanna make sure we can get into these real-world examples as use cases to kind of maybe demonstrate some of these points.

Shawn Stapleton: So that just adds on to the use cases, right? We use AI today — we've got dozens, if not more, use cases of AI working in health care today, and they are successful and they are valuable. So I don't want to come and say, “Hey, these things will not work until we get everything right.” Everything continues to be a work in progress.

And two examples that we have, which are really permeating across the entire health care ecosystem — not just oncology-specific — are things like AIScribe and AI triage tools specifically for pulmonary embolism detection.

And we have seen success and value in these technologies. For scribes, just to call it out, these are technologies that listen to patient–provider interactions. They transcribe them. So rather than the physician having to write the note from memory of what the interaction was with that patient, the technology records the voices, it transcribes, and it generates the clinical note and the summary of that clinical note, if you choose.

And then, on the pulmonary embolism detection side, we're simply looking within CT — computed tomography — images of the lungs to identify if a patient has a pulmonary embolism, which is a potentially life-threatening situation and needs a very timely response. So we leverage AI to help detect if a patient has a suspected pulmonary embolism and bring it to a radiologist very quickly to do those reads.

We use these quite ubiquitously. But as part of using them, we have noticed vulnerabilities pop up, and we are working with our clinical teams to manage these vulnerabilities day to day. There are still some that are not fully managed — or I would actually argue not fully realized. We don't always spend the time to look at them, although we hypothesize that they are there.

10. Building Trust in AI Systems

Shawn Stapleton: We are actively now working in these areas to help ensure that our humans are really capable humans in the loop.

Jay Combs: That's right. And I think one key point too on trust — and I think there's another question on, if something goes wrong once, how do we prevent that loss of trust over time?

Shawn Stapleton: That's right. So, I mean, just a quick comment on that. This gets back into a universal skill-set type of activity that we perform. And we do this all the time. We have feedback loops that come from a variety of different directions back to our clinical teams — whether it's from a quality office within health care, a patient safety office within health care, or a high-reliability office within health care. Those all feed back to our clinical teams.

And so those are all opportunities for us to learn about the impact AI has had across all of those domains, right? And we try not to build a loss of trust. That's our opportunity to speak about the technology and where it was successful and where it wasn't — and why not.

Jay Combs: Thank you, Shawn. I mean, I know there's a lot to cover, and we can talk about this for hours, but we're about to wrap up here, so I just want to kind of summarize a few things.

As we've discussed, establishing a meaningful human in the loop isn't just about putting a clinician or a person next to an AI system. It's about giving them the right controls and the right context so they can act confidently — and that's where ModelOp’s AI lifecycle management and governance platform is essential.

ModelOp provides an auditable, centralized AI system of record for every AI use case and solution in your enterprise, with dynamic model cards that keep documentation current as systems evolve. Risk tiering ensures each type of AI has the right workflow and governance. Traceability keeps each decision transparent. Policies and changes are automatically enforced, and continuous monitoring validates performance over time.

All this integrates with the systems you already use so people and AI can work together safely, consistently, and meaningfully. The fundamental question is not about choosing between people and AI. It's about how to design systems where each strengthens the other to deliver meaningful outcomes. The capable person remains essential, but the skills, expectations, and oversight required are still evolving across health care — and really all industries.

11. Conclusion: The Future of Human-AI Collaboration

What today's webinar showed is that a meaningful human in the loop only happens when organizations deliberately support both sides: upskilling people's capabilities and governing AI throughout its life cycle. And that's why AI lifecycle management and governance matter. Without them, you can't ensure safety, trust, accountability, or long-term value.

So if you want to learn more and explore how to operationalize meaningful human in the loop in your own organization, please visit us at modelop.com. I'm going to share the webinar recording and presentation over email.

And thank you so much to our special guest, Shawn Stapleton from MD Anderson Cancer Center. Please give him a warm goodbye. And thank you to everybody, and happy holidays. We hope to see you again soon with more webinars in the future. Have a great one, everybody.

Get the Latest News in Your Inbox
Share this post
Contact Us

Get started with ModelOp’s AI Governance software — automated visibility, controls, and reporting — in 90 days

Talk to an expert about your AI and governance needs

Contact Us