Humans in Loops, Flows, and Dialogues
Ever since machine learning started to become real, we’ve calmed our nerves about it supplanting humans by proposing “humans-in-the-loop” systems. That’s often a good idea, but its nature is changing and not just because of technical advances. Our relationship to the tech is changing as well.
A classic example of a human in the loop is an AI that can predict which cells in a biopsy are precancerous. In a perfect implementation, not only would a human doctor be shown the machine’s results, but the system would also learn from the identifications the human corrects. In a more perfectly perfect system, follow-up results would be fed back in to further sharpen the system’s accuracy. This would be humans and AI learning from each other. It’s a beautiful thing! Imperfect, yes, but better than what it replaces, and that makes it beautiful.
There are issues, of course.
First, the human in the loop only works when there’s time for a human consultation. For example, when an electric vehicle is in “self-driving” mode, we don’t want it to preview all its decisions, leading it to say, “I’m thinking of swerving to avoid the full load of pitchforks that truck just dumped 10 feet ahead of us. What do you think?” Second, you only want humans in the loop where human expertise is up to the task. For example, if the AI has developed a significantly better record at diagnosing some forms of cancer earlier than humans, then maybe we will want to take the humans out of the loop. More controversially, if AI can land an airplane while being sensitive to a hundred different variables that a human couldn’t integrate, the safest course might be to turn off the human pilots’ control. (Why the airplane example is more controversial than the diagnostic one is an interesting question.)
Third, we certainly don’t want to put a human in the loop if doing so actually decreases the accuracy of the system. An MIT paper by Michelle Vaccaro, Abdullah Almaatouq, and Thomas Malonek, “When Combinations of Humans and AI Are Useful: A Systematic Review and Meta-Analysis,” found that AI on its own was substantially better at detecting fake hotel reviews than when you put a human in the loop (arxiv.org/abs/2405.06087). A 2025 Stanford study by Hanae Armitage found that adding doctors to an AI diagnostic loop didn’t increase its accuracy (med. stanford.edu/news/all-news/2025/02/physician-decision-chatbot.html). At least in this case, humans degrade the loop. Jacob Nielsen cites others in his Substack essay (jakobnielsenphd.substack.com/p/humans-negative-value).
But all of these examples assume that the humans are in the loop to enable the loop to make the decision. Many, if not most, of us are already having experiences of being in AI loops that include figuring out what a good decision would be.