-->

Register Now to SAVE BIG & Join Us for KMWorld 2025, November 17-20, in Washington, DC.

The Next Generation of Knowledge Management

Article Featured Image

The ramifications of this development are as varied as they are impactful. Nonetheless, many of them pertain to applications of RAG. “By including everything in the question to the model, you’re removing a need to do RAG at all, in some cases,” Allen observed. “For some implementations, you won’t need as sophisticated of a RAG algorithm.” This development also greatly reduces the onus of chunking, or deciding what sizes and discrete amounts are prudent for creating vector embeddings.

Intelligent Classification

Organizations can also expand the context for their content by implementing different classification methods. Those tags and classifications can be augmented with rich metadata descriptions that are vital for improving relationship discernment for knowledge graphs, humans, language models, and AI agents. “For document classification, we’ve gone through a big change around extraction: from properly described machine learning into more machine learning slash, prompting slash LLM stuff,” Smith said. “A lot more of that context can be suggested in curation or management processes.” In some cases, auditing and analysis of the utilization of responses issued from language models can be reinserted into a vector store to indicate its applicability for certain use cases. This approach broadens the amount of “attribution to add more context to the documents that are there and ensure that’s used when any answer’s returned,” Lake noted.

Another approach for adding metadata for document classification is to perform a keyword search for documents and index them according to the responses. This rudimentary measure becomes more useful by connecting those keywords to a larger taxonomy. “If you do keywords only, you miss all the synonyms,” Aasman commented. By linking keywords to a taxonomy, organizations can add synonyms to their keyword-based document classification. Aasman advocated another option in which organizations generate vector embeddings of taxonomies and their linked terms, then send documents to language models to annotate them with it.

Intelligent Document Processing

The most progressive method for creating metadata descriptions and tags for enterprise content involves the suggestion (or recommendation) approach Smith described, which is largely facilitated by advanced machine learning. This technology is becoming extremely important when incorporating images and intelligent document processing applications based on vision language models (VLMs). According to Allen, VLMs “provide textual understanding of images. You can give it an image of a document, a PDF, for example, and it can describe what’s in the image.” These models are also adept at segmentation and can discern different parts of a document, such as figures or graphs. Moreover, they can read the information in these constructs, extract what’s desired, and render that into different forms, such as tables.

According to Allen, “If we combine VLMs with LLMs, we can build pipelines to build document databases with much greater fidelity than we’ve been able to before.” With implications for applying segmentation and visual search to video, these models are predicted to eventually displace traditional capture technologies like OCR. For the time being, they support applications in which “you pull out chunks of text without doing OCR, then take those portions of document identified as pure text and feed that into an OCR engine, which can operate more efficiently, at a lower cost, than a general-purpose VLM,” Allen said.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues