Guiding enterprise search and discovery with open-source technologies at KMWorld 2022
Enterprise Search & Discovery, a co-located event at KMWorld 2022, is centered around the next frontier for enterprise search, aiming towards accurate and quick answers for users’ questions. The technology innovated to advance these goals must align with user needs and behaviors, as well as hybrid and remote work environments; Pari Rajaram, search and AI architect at MITRE, elaborated on the kinds of tools that better achieve search and discovery ambitions at KMWorld 2022.
During his session, “How to Enhance Search and Recommender Applications Using Open Source Tools and Ideas,” Rajaram introduced MITRE’s mission, where increasing knowledge discovery and reuse and further solving problems in the efforts of a safer world guide their operations.
As a non-profit that builds robust enterprise search services for a variety of use cases, peoples, projects, and products, MITRE optimizes platform user engagement, relevant search, timely recommendations, and useful analytics. The organization also focuses on NLP infusion at search and ingestion time with intelligent middleware, and enriched attrition of metadata through ETL.
Rajaram detailed standard tricks that can aid in boosting relevancy within a search service:
- Query classification: focus on questions such as, “What are the entities in the query?” and, “Can we segment the query phrases?” to improve accuracy.
- Query expansion: employ acronyms and synonyms for wider reach and greater recall.
- Clickthroughs: track clickthroughs and use them to rank documents.
Delving deeper into these tips, Rajaram discussed particular tools that can be leveraged to enhance these and other search and discovery guidelines within an organization's knowledge base.
With Elasticsearch, a distributed, RESTful search and analytics engine, users can utilize its Function Score query feature to improve holistic control of search rankings.
Mallet, the provider of machine learning for language toolkits, offers implementations that advance document organization. Soft clustering documents with Mallet’s topic modeling, an unsupervised learning model that tries to infer related topics through a data corpus, is a technique that can output topic distribution for each document and popular keywords per topic.
The Word2Vec model from Google aids in generating synonyms for query expansion, converting words to vectors to make words of similar meanings and contexts appear closer in a high dimensional vector space. It also supplies a shallow neural network and self-supervised learning, along with similar word groupings by nearest-neighbor vector search.
Returning back to MITRE, Rajaram explored their various tools that embrace better search and discovery. The MITRE document recommender pouches recommendations to employees via their personalized internal employee UI; based on a recommender algorithm, MITRE can provide collaborative filtering and content-based filtering to established users. Organization of users by their search queries, feedback signals, and historical project ID’s aid in suggesting similar users, projects, and departments.
MITRE’s future progress revolves around deep learning. Addressing use cases such as document classification by topic and leveraging them as search facets, classification without training data, and question/answer extraction with transformers form MITRE’s continuing advances in efficient knowledge discovery and collaboration.
KMWorld returned in-person to the J.W. Marriott in Washington D.C. on November 7-10, with pre-conference workshops held on November 7.
KMWorld 2022 is a part of a unique program of five co-located conferences, which also includes Enterprise Search & Discovery, Office 365 Symposium, Taxonomy Boot Camp, and Text Analytics Forum.