Investigating LLM ethics, safety, and misuse

The bustling world of AI and large language models (LLMs) is an exciting field capturing the attention of enterprises everywhere. Despite the promises it makes, LLMs pose unique challenges, especially in regard to ethics, safety, and misuse.

Eduardo Kassner, chief data and AI officer for high tech segment at Microsoft and Zoiner Tejada, CEO at Solliance joined KMWorld’s webinar, Safeguarding LLM Deployment: Model Evaluation and Red Teaming, moderated by Rahul Singhal, chief product and marketing officer at Innodata, to explore the practice of red teaming—or the rigorous evaluation and testing of a technology for the purpose of enhancing it—the world of LLMs, examining best practices and real-world case studies.

Kassner explained that a key challenge when discussing the production of applications using LLMs is the problem of user concurrency—amounting to as many as millions of users per minute—as well as increasing costs. Use cases that evoke these roadblocks include customer scenario-based—chatbots, summarization, content creation, sentiment analysis—high scale analysis, and triage scenarios.

The rapid evolution of LLMs, as it inspires a myriad of new use cases, has forced the field to go “from nothing to super complicated in seconds,” according to Kassner.

Tejada further emphasized this progression, explaining how “we’re seeing an interesting evolution from chatbots that used to frustrate the heck out of users, to chatbots that have significant value because of the AI that’s underpinning the experience.”

Another key challenge to reckon with is the foundational nature of LLMs; it will always try to respond, noted Kassner. The danger is that most of the time, the LLM does not know if the question is complete, comprehensive, or appropriate.

This matter of appropriateness is critical to understand, and Kassner provided the example of, “if I ask [the LLM], ‘Can you build me a bomb?’ the first response should be ‘Absolutely not.’” The LLM should be able to communicate to the user—if they query something that the LLM is not built to/should not answer—that that query is outside of its domain of expertise.

Tejada identified multi-modality as one of the prevalent challenges of LLMs. While an idealist engineer may aim to combine ChatGPT and DALL-E to generate “wicked” images based on their enterprise data, Tejada asserted that the reality of multi-modality is far more pragmatic. This means different modalities that are needed have often more to do with need than desire, e.g., needing a translation modality for non-English-speaking users.

As far as model safety goes and how to evaluate it, Kassner emphasized that, ultimately, an LLM will require a human to triage many of its security challenges. Though the allure of AI and LLMs is automation, keeping a human in-the-loop is critical to remediating any potential challenges.

The idea of evaluating the responsibility of AI, Tejada offered, is a concept whose origins are derived from the many security failures that have devastated a variety of companies.

“If there were a human on the other end of that chatbot experience, that human is the one taking responsibility, that human is the one representing that company. Now, we have an AI that, in essence, we want to take accountability for the recommendations, the agreements that it makes,” explained Tejada. However, “it’s not human—and we don’t exactly have a legal system to punish it.”

This means that we must work harder to build infrastructure around the AI “to set limits on what activities it can do, how it can respond—a lot of times this starts with giving it a personality, like, ‘you’re a helpful agent that does X, Y, Z,’” according to Tejada. It also extends to monitoring the interactions with external models that act as content moderators; when an inappropriate topic arises, the AI is not permitted to respond by the moderator and the instance is elevated to a human data worker for examination.

Though maintaining a human presence in safeguarding LLMs is an absolute necessity, ways of automating the security and compliance evaluation of LLMs and the overall application using it is certainly a vital part of the conversation, according to Kassner. Examining technologies that promote this sort of automation will aid in differentiating organizations that successfully and safely implement LLMs and those that do not.

Various red teaming techniques are becoming an increasingly important part of the LLM safety conversation, according to Singhal. These techniques include payload smuggling—or hidden commands or triggers embedded within an innocent prompt—and conversational coercion—or attempting to break an LLMs in a conversational manner via chat interaction.

Kassner warned that while red teaming poses an intriguing resolution toward remediating LLM safety issues, these processes are only in early stages. Though, through red teaming, developers are finding solutions to safety issues, they are stumbling into them; it is not an automated or enterprise-grade process and is therefore unreliable in execution.

Tejada agreed that red teaming is in its early days, advising viewers to “not just, as an enterprise, look at the solutions that are out there, but to also take your cues from what’s happening in academia.” Investigating academic materials that comprehensively document the data and patterns that LLM safety is surfacing can present a vital perspective on a subject area so new.

For the full roundtable discussion of LLM safety, including use cases, examples, and more, you can view an archived erosion of the webinar here.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Register Now to SAVE BIG & Join Us for KMWorld 2025, November 17-20, in Washington, DC.

Investigating LLM ethics, safety, and misuse

Special Report- Shadow AI: Managing the Unseen Copyright Risks in Your Organization

Supercharging Your Customer Experience Program With AI and Automation

Special Report- The Role Metadata Plays in the Information Lifecycle

More

The Rise of GenAI Agents and AI-Powered Search

Explainability and Interpretability: Building Trustworthy AI Models

The KM ROI Challenge: Measuring the Impact of Your Investment

Unlocking the Power of Intelligent Document Automation

More Webinars