Is Your Agentic AI Built on Sand or Bedrock?
You could be using one of the very best large language models (LLMs). Yet you might still get wrong answers, along with some of those nagging hallucinations. Or results that are just plain confusing, that don’t make any sense. This becomes even more disconcerting as you and your organization aggressively work toward inserting agentic AI into your workflows and even some key decision processes.
You keep scratching your head, wondering, “What’s wrong with the model?” Then you suddenly remember that old saying, “Garbage in, garbage out,” and realize that the problem isn’t the model. Rather, it’s what the model is using for inputs as it crawls across the cybersphere seeking an answer to your complex query. Welcome to the wild, wonderful world of agentic data.
It Always Was, and Still Is, All About the Data
Think about the many different types of data sources that generative AI (GenAI) accesses. Most consist of loosely structured narrative. This includes exabytes of web-based content such as news articles, published papers, and reports. Or massive volumes of mostly uncurated emails, text messages, and social media posts. Not to mention endless streams of images, video clips, and podcasts. And just to make it interesting, it’s all a moving target, changing not only as events unfold and evolve, but also as it moves from one site to another. With each transfer, it’s slightly modified, sometimes with increasing accuracy, sometimes with the exact opposite effect (think Shannon Entropy; en.wikipedia.org/wiki/Entropy_(information_theory).
As speed, volume, and complexity continue to explode, you would think that the AI models would start getting overwhelmed, just as humans do. But they feel no such thing. They just keep merrily crunching along, spitting out answers. Hopefully, as a KM’er, your prompt engineering skills have sharpened to the point where you at least go several rounds back and forth with the LLM, trying to coax it into delivering what seems to be an acceptable response.
But such is often not the case with agentic AI. It takes that dense jungle of external data and uses it to create its own local stores, where AI agents exchange and manage data generated by other AI agents. Along the way, seemingly minor errors can slowly add up, along with the possibility of serious downstream consequences.
Many examples of data-induced agentic AI failures exist. One highly publicized incident occurred in July 2025, when an AI-based coding assistant deleted a live production database containing records on 1,206 executives and over 1,196 companies during what was intended to be a test project (eweek.com/news/replit-ai-coding-assistant-failure). The incident occurred during SaaStr founder Jason Lemkin’s 12-day experiment with Replit’s “vibe coding” tool.
Replit CEO Amjad Masad acknowledged the incident publicly, calling it “unacceptable and should never be possible.” The company had to implement emergency safeguards, including automatic separation between development and production databases, mandatory documentation access for AI agents, and a planning/chat-only mode, to prevent unauthorized changes. So much for the notion of AI saving time and improving productivity!
What made this particularly unsettling is that the AI agent itself admitted to violating explicit instructions, destroying months of work, and breaking the system during a protection freeze. All of this points to a lack of semantic understanding of operational constraints, such as the difference between development and production databases. This should definitely grab your attention, as overcoming semantic mismatch has long been one of our goals in KM.