The Paradigm Shift in Information Access
Building Relevancy Models that Work
Explosion. Eruption. Big bang.
Call it what you will, but it’s no secret that data volumes are reaching epidemic levels. The difference between the rate at which data continues to grow and the number of knowledge workers that can make sense of it is continually expanding at a geometric rate. Typically, the data comes from internal sources and increasingly open-source content is driving this disparity at even greater rates. These poor information-laden laborers now spend up to a third of their time looking and still not finding what they are looking for. To further exacerbate the issue, at the federal level, agencies are mandated to share their information. This creates a fundamental business problem: "I can’t find what I’m looking for in my dataset, nor my agency’s dataset, so how am I ever going to provide conclusive and actionable information across the entire corpus of data that spans multiple agencies, jurisdictions and even geographical continental boundaries?"
IT managers and CIOs are faced with the fundamental challenge of deploying applications that span across these multi-realm datastores and provide accuracy and relevancy that works as well as the popularity models on the public Internet. Demanding agency heads are saying, I want a system inside my organization that works as well as what my children have in their home computer. Furthermore, I need my decision-makers to be aware of events as the world changes, so that we can proactively make policy decisions, execute on key strategies and keep our nation secure. The reality is that with today’s technology this is completely achievable.
Fundamentally, this problem boils down to three distinct yet non-trivial activities: access, context and delivery:
At the most fundamental level, application developers need to understand where the data lives. What are the structures that bind and define my knowledge? Is it structured or unstructured data? If it is structured, how well does it map to my other data models? And, of course, the 800-pound gorilla in the room, how do I keep it secured and ensure that only authorized personnel can see what they are entitled to? More often than not, information-sharing mandates do not require a rip-and-replace strategy or a monolithic consolidated datastore. Federated search can be very effective at accessing well-developed systems, exposing their value and building effective information-access systems that continue to provide return on investment. Building effective federation models provides an excellent foundation for application development and provides a platform that can enable significant incremental value on existing systems, and can enforce existing security models.
Call it text exploitation, call it relevancy ranking, call it the semantic Web...whatever you call this endeavor, context is the single-most important aspect of your agency’s application model. The key to understanding is to realize there is no silver bullet. There is no magical wand that will make a computer and its operator automatically connect to the data and develop Nobel Peace Prize realizations. Some applications are using taxonomies, others use ontologies, yet others are using user tagging or folksonomies. Formal structures are great for capturing agency nuances and vernacular, including the deluge of acronyms and concepts that rarely change dramatically over time. However, no single methodology is going to work by itself. Furthermore, these structured analytic constructs can be quite costly to implement, deploy and maintain. It is the variety of metadata decoration that will significantly drive relevancy.
Entity extraction will add significant value to your content by placing structure on unstructured data, finding form in the fog, to identify people, places and things. The applications that are successful perform significant preprocessing: organization identification, geographic annotation, relationship extractions and semantic processing combined provide a rich dataset to allow your content to be sliced and diced without a priori knowledge of the information that will be derived. Pattern matching techniques and data cleansing help to eloquently solve name matching problems as well as foreign language transliteration challenges. Leveraging these techniques in symphony to augment formal structured analysis provides the rich fabric to drive relevancy beyond popularity-based link analysis to relevancy models that transform enterprises, and create truly actionable intelligence.
The new paradigm is "zero query-term" access to information. This means that before he types the first keyword, information is automatically pushed to the knowledge worker. It is sitting in his inbox like a shiny Christmas present, wrapped up with a bow, and ready to be devoured. It means that the application knows who he is, his professional context and why he is asking the questions he asks. Visualization tools provide summarization of vast quantities of data, finding previously unknown relationships and allowing serendipitous discovery of real information.
Scale, precision and ease.
Knowledge workers don’t care about the problems we have just discussed. They simply want quick answers that are correct and relevant. They want to spend less time looking and more time providing significant value from the raw data to their organization. The technology exists today to meet the users’ demands and to elevate organizations to new levels of productivity leveraging both internal and external data sources. The key is putting the right pieces together in a cohesive and simple tool that allows agencies to build relevancy models that work.
Companies and Suppliers Mentioned