Making Search Conversational to Improve Knowledge Access
Additionally, what users can now express in information retrieval tasks vastly exceeds what they could before the advent of language models. Osborne described a use case in which someone is looking for the “coziest” and “best” restaurant in a certain area. “The model can say, ‘I’m seeing words tagged with emotion, like best and coziest, and I’m also seeing restaurants,’” Osborne mentioned. “So, it’s going to look for all of that in context.”
Vector Retrieval Systems
A consequence of performing semantic search with language models is that organizations don’t always directly interact with the sources. “You’re getting the answers and not necessarily the underlying documents,” Allen said. “Maybe you’ll get references and can click on those.” With vector search engines, users can authenticate the accuracy of language models by referencing the sources responses are based on. These vector computing platforms are often used for retrieval-augmented generation, in which searches are supplemented with relevant enterprise content embedded in a vector database. This form of prompt augmentation decreases inaccurate model responses while making an organization’s data “AI-ready,” Vaidyanathan said. “That’s representing content as vectors, applying intelligent chunking strategies, and using retrieval engines that can surface the most relevant pieces of information.”
Vector computing platforms are desirable for numerous reasons, including their hybrid search capabilities involving keyword search and similarity search. They also make previously dark data, such as video data, readily searchable. Bixby mentioned that videos of how-to content for teaching customers to use software, specific features, and advanced functions can be embedded as vectors. Doing so not only includes the audio (which may have already been transcribed), but the visual content as well. “You could ask a question; the answer could be a URL in the video that takes you right to the point in the video where the answer is,” Bixby remarked. With this approach, it’s even possible to video-record someone using a legacy application, vectorize the recording, ask questions about how the application works, and use the answers to help build a low-code application that modernizes the original.
Chunking and Hallucinations
The benefit of employing vector retrieval systems is that they’re applicable to any type of enterprise content: They support semantic search; they help confine language models’ responses to enterprise content; and they provide conversational search. However, to deliver these results at enterprise levels requires a significant amount of upfront effort. Chunking, in which users determine what the appropriate size for vector embeddings is, is a practical precursor for relying on these repositories for conversational search. According to Osborne, “There is no magical algorithm for chunking your data. Chunking is important because it’s the chunks in your data that get compared to the users’ query. If the chunks are not right, or they’re separated at intervals that break things up too great semantically, the user won’t have a good experience.”
Organizations can inform their chunking strategy by tagging the content they’re embedding with metadata—which can provide additional context for language models doing vector similarity searches. According to Bixby, for a chunking strategy, “It’s the content type that seems to make the most difference. Are they things that take a tabular format versus rows, versus a learning manual versus policy data?” Furthermore, the tendency to produce inaccuracies, or hallucinations, can only be minimized, not eliminated, when employing language models. Bixby noted that many inaccuracies can be diminished by properly vetting the data in a vector retrieval system. “Oftentimes, people will say it hallucinated, and then when they check the source you provided, you then realize the documentation was incorrect,” Bixby added.