Search: power tools that leverage corporate knowledge
The search function in InfoPreserve includes basic and advanced features such as fuzzy search and stemming, custom tags and index fields. "Our document management system can extract content from over 140 different file formats, set up custom tags and index fields, or prefilter information," Chapman explains. "dtSearch's powerful search engine then allows our customers to quickly and easily search on that content."
One of InfoPreserve's customers, Medical Resources Management, supports physicians, hospitals and medical facilities by managing insurance claims processing. "The amount of paper that is still being used in hospitals and insurance companies is amazing," says Jason Rosenberg, president of Medical Resources Management. "About 60 percent of insurance companies send their data in paper form."
The company had been looking for a system that could store all the information it receives on a daily basis, which is about 500 pieces of mail. "We often have to search the system and verify that we received a particular document," adds Rosenberg. "We had looked at everything from enterprise systems to desktop scanning. The InfoPreserve platform was inexpensive and does a great job. Even if we receive an inquiry without complete information, we can find the document we need."
dtSearch has extended its capabilities as the size of document repositories has increased. "We revised our algorithms to improve our search at the terabyte level," says Elizabeth Thede, director of sales at dtSearch. "In addition, we support 100 percent concurrency, meaning that when multiple searches are being carried out at the same time, everyone gets the same rapid search rate." Advanced weighting features allow fine-tuning, such as giving one meaning of a word a greater weight than another meaning of the same word, improving relevancy.
Knowledge management has much to offer in disaster prediction, response and recovery. Knowledgebases documenting the events around a disaster and the effectiveness of response efforts can yield valuable insights for future improvements. However, developing systems that support decision making under extreme conditions requires a new understanding of how communication can take place in an environment where social dynamics are seriously stressed. In addition, both natural disasters and those that result from human actions tend to be very complex environments.
An initiative funded by the National Science Foundation (NSF) is supporting research on that topic. Last year, a series of workshops on "Computing for Disasters," co-sponsored by the NSF and the Computing Community Consortium, explored key issues and provided an analysis of challenges in the field and a roadmap for future action.
One phase of the project involves data collection to develop a body of knowledge that can be used for research purposes. "We are now keeping track of 9.2 terabytes in our Internet Archive," says Dr. Edward Fox, professor in the Department of Computer Science at Virginia Tech. The collection includes about 150 million documents in 40 collections for specific events, in addition to 145 Tweet collections.
Exploring, analyzing information
Searching a repository of that size requires a robust search engine. LucidWorks offered to work in partnership with Virginia Tech to support the task of exploring the collected information using LucidWorks Search and a beta version of the LucidWorks Big Data platform. LucidWorks produces commercial search tools based on the Apache Lucene/Solr open source search project. "The life cycle of our information includes collection and ingesting it, converting it into forms that can be manipulated, archiving and analysis," Fox says. "LucidWorks fits into the exploration and analysis portion."
Data discovery includes browsing, searching, clustering and other analytical processes. "One area of research involves campus shootings, such as the one we experienced in 2007," says Fox. "We can use the information we collect to help determine whether a factor such as bullying is the primary cause, and then identify interventions that would help."
Another project was focused on one of the hurricanes. A student researcher looked at a set of Tweets, and created a time line that showed connections to organizations such as the Red Cross, and then georeferenced the data to visualize it on a map. "The ability of LucidWorks' search tools to scale and integrate with data sources is essential to this type of project," Fox says.
The role of search has changed considerably as the volume of unstructured information has grown exponentially, according to Paul Doscher, CEO of LucidWorks. "Search has to be more than finding documents," he emphasizes. "It needs to integrate complex content from multiple sources and provide the output as an information resource to decision makers." A prime example is customer support. "If an agent has to check multiple sources while on the phone with a customer, finding the information quickly is difficult," he explains. "When a search engine consolidates the results in a meaningful picture, the information can stay in the original location but be presented in a single interface."
With the emergence of big data, the line between structured and unstructured data is also blurring. "Some organizations are migrating their data out of relational databases that are ineffective in scaling to high volumes and retrieving information using SQL queries," Doscher says. "High-value data can be stored in Hadoop, with user applications built on top of that data store." He adds that in the future, search solutions should know more about individual users, and feed answers to questions they didn't know they had yet.