Securing Your Internal Knowledge Amidst Shadow AI
Contemporary organizations must protect their internal knowledge from a growing number of threats. Instances of security and data breaches—which can involve both gaining unauthorized access and extracting (stealing) data from the organization—result in costly fines, ligation, regulatory penalties, loss of reputation, and churn.
Users must remain vigilant about both deliberate and inadvertent forms of security breaches. Examples of the former include cyberattacks such as malware and ransomware; the latter involves phishing attempts, toxic combinations of information, and inappropriate data-sharing with partners, suppliers, or customers.
Finally, the growing use of intelligent chatbots, large language models (LLMs), and AI-infused agents means organizations must also remain wary of the free versions, which appropriate proprietary knowledge to train the underlying language models, resulting in a devastating loss of competitive advantage.
The formula for securing internal knowledge for these and other pitfalls is relatively simple. There’s a data discovery and metadata extraction phase to help knowledge managers understand what sensitive data organizations have, how it relates to specific regulations, and which controls might be appropriate for safeguarding it. Organizing that metadata into a knowledge graph heightens relationship discernment between it and germane business objects.
This foundation becomes the basis for applying access control policies (according to roles or attributes) and identifying toxic combinations for access to business assets for different user types. Access controls are then administered to securely share that knowledge inside and outside of the enterprise. Lifecycle management features can dynamically update data access policies so that internal knowledge stays safely within organizations or vetted business units.
Still, according to Tony Grout, chief product officer at M-Files, “Shadow AI is the new shadow IT. And it’s 100 times more dangerous, because once this data leaks into an AI LLM, you can’t get it back out. Making sure you have really good AI inside your organization is one of the most critical ways of stopping AI from being used outside your organization.”
Data Discovery and Metadata Extraction
There are some basic ways users can avail themselves of trustworthy AI inside their organizations to avoid the shadow AI phenomenon Grout mentioned. Most data access governance vendors include machine learning (ML) techniques during the data discovery process of ascertaining what sensitive data is where. Rules-based approaches, regular expressions, and taxonomies are also widely used for data discovery.
Progressive vendors employ dynamic agents for this vital first step, which also entails extracting metadata from content for classifications, tagging, and more. Grout described a metadata agent responsible for “the automation of workflows based on metadata extraction from documents. And, that goes into the knowledge graph.”
Metadata Management
Most solutions for protecting enterprise knowledge contain inventories of the metadata obtained from the data discovery and classification stage. There, users can upload additional metadata, including ontologies, subject area models, business glossaries, and other terminology to describe the knowledge assets that require access controls. Knowledge graphs are particularly effective for managing this metadata because they identify relationships between them and business concepts such as customers or specific products. “Because you have the knowledge graph with things like the concept of a customer, our metadata agent can look up information from each document and work out if that’s related to that customer, because there’s a customer object,” Grout explained.