Making Search Conversational to Improve Knowledge Access
Hybrid Search
The hybrid search most frequently invoked with conversational vector retrieval systems involves keyword search and similarity search. The former is still lauded for situations when users know exactly what they’re looking for, “a document title, policy number, or legal phrase,” Vaidyanathan indicated. “Keyword search is also foundational for compliance scenarios.”
Other search techniques, including the query-based approach Allen mentioned, can also become hybridized or modernized to include a conversational interface. “‘Query-based’ means specifying conditions using logical operators to query the properties of an object,” Allen explained. “So, the metadata of the object. You can select fields like ‘and,’ ‘ought,’ or ‘greater than,’ ‘less than.’ This is not keyword search. But you can embed keyword search as part of it.” The closeness of the relationship between query-based search and certain forms of keyword search is underscored by the use of Boolean operators, some of which include the same words Allen described as fields.
According to Osborne, Boolean operators are useful for “looking for things like ‘customer’ and ‘order’ together.” Query-based search is particularly apt for workflows, some of which might involve retention, lifecycle management, and other use cases in which the actual document is necessary for downstream processes. Allen articulated a legal discovery application pairing vector similarity search and query-based search. “You may do some type of vector search to identify documents that are relevant, then probably fall back to a query-based type of search to retrieve the exact documents and analyze them in more depth,” Allen explained. “Maybe you’ll select a subset of those and put them in a legal hold so they’re exempt from standard lifecycle management processes.”
Metadata Tags
Allen’s use case reflects several preeminent factors pertaining to contemporary conversational search. First, a vector retrieval engine and, more than likely, language models are involved as the conversational interface. However, other types of search, in this case, a query-based approach, supplement the former. Thus, the steps organizations need to reliably implement conversational search not only involve embedding their content into vectors and implementing appropriate chunking strategies to do so, but also readying that underlying content for optimized results. Even after vector computing platforms are utilized, or perhaps even before, “The real differentiator lies in how that content is enriched with contextual signals, like linking it together, adding semantic relationships, and providing the depth that large language models need to interpret and respond accurately,” Vaidyanathan said.
Tagging vectorized content with metadata— which is one of the primary means by which vector computing platforms filter data, either before, after, or during searches—is invaluable. Doing so is also aligned with traditional search practices, reinforcing the utility of hybrid methods. “Often with tags, it makes queries easier to issue, and LLMs work great there,” Allen observed. “We’re working on a classification function that uses LLMs that should help with the tagging. Although tagging’s not necessary to use the query-based approach, it’s certainly helpful.”
Taxonomies
Metadata-tagged content can enrich search results. It also helps to reinforce the underlying semantics for conversational search systems to produce intelligent responses. Organizations that enhance metadata tags with full-fledged taxonomies can potentially redouble these benefits. Taxonomies are helpful because they identify relationships in the metadata and terminology describing vectorized content. According to Vaidyanathan, expressing those semantic relationships is critical. “This is where technologies like knowledge graphs play a role, helping to organize and connect content across domains so that LLMs can reason over it more effectively.” Incorporating taxonomies and knowledge graphs heightens the capacity to access enterprise-approved knowledge from conversational search. It also circumscribes the propensity for language models to hallucinate.