-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

World-class Data Extraction

In the fields of imagery, geospatial information and remote sensing, Lockheed Martin has helped chart the course of vital national intelligence systems for more than 30 years. Lockheed Martin provides geospatial intelligence systems and solutions to the National Geospatial-Intelligence Agency (NGA), formerly the National Imagery and Mapping Agency (NIMA). Additionally, we have successfully integrated AeroTextâ„¢ as part of our geospatial intelligence technologies and capabilities from across the corporation to an even wider range of international, federal, defense and state and local government customers.

The AeroText product suite provides a fast, agile information extraction system for developing knowledge-based content analysis applications. The range of possible applications includes automatic database generation, document routing, document browsing, document summarizing and improved fulltext searching. In the field of information extraction, the breadth and depth of possible domains and tasks is so large that a text-processing system must not be limited by the number of trained knowledge engineers and domain experts.

For example, the AeroText Core KB can distinguish between different places within a document. Based on the linguistic clues or other heuristics from the document, AeroText can determine that a mention of "Paris" is "Paris, France," not "Paris, VA" or "Paris, WI." Once it is determined that a place is a particular point on a map, it can do a look-up in an external gazetteer to provide back information like the latitude-longitude that can then be used by other tools for further geospatial exploitation.

How it Works
Automatic database population. Applications can populate databases automatically from extracted information. Extracted information is represented with fielded records that can be used as a row in a database. This process can be viewed as a conversion process of unstructured text to a structured database. Once the information is in a structured database, the information can support queries, visualization and other analytical processes. This type of application may be easily configured by using the RIT Generic Database Output Target, which maps extraction results onto existing databases.

Document routing. Applications can route a stream of documents to different destinations based on the extracted information. In contrast to keyword profiles of full-text search engines, extraction profiles can involve complex categories, such as proper names, entity relationships, events and topics.

Document browsing. Applications can provide displays of extracted information that can be browsed to support on-demand analysis of search results. In addition to providing the list of query results, the system can process the retrieved documents and extract the significant items of interest in a visual display. This display can speed analysis by locating the key information within the search results.

The X2Râ„¢ System is an example of such an application. It provides a facility for information analysts to use a knowledge base to help build complex query profiles. The analyst starts with a primitive query, retrieves a set of documents, applies extraction to relevant documents and then chooses extracted terms to add to the query. In an iterative fashion, the analyst can build up a complex query within a graphical user interface, thus reducing the amount of time required.

Document summarization. Applications can provide summaries of retrieved documents based on extracted information. In addition to the query results, the system can process the documents and derive summaries based on significant items of interest. The summaries can speed analysis by distilling the key information within the search results.

Enhanced full-text search. Keyword searches lack precision due to the ambiguity of words and the lack of contextual information about the documents. New technologies can improve a full-text search by enhancing the index of the documents. Systems can provide additional terms and term properties for the index, such as proper names and key phrases. They can also provide information about the structure of the indexed document, such as header fields or other zones, such as captions.

Targeted document search. Applications can use a system such as AeroText for searching small collections of documents for specific information, as illustrated in the figure. In this scenario, the documents are not part of a document repository, so full-text search is not an option. The system can interpret the search request as an extraction profile that is matched against the extracted information from the documents.

Document clustering. A "corpus analyzer" is able to cluster documents based on an analysis of their content. Each cluster will be presented with a list of keywords that are common in the cluster but not in the overall set, making it easy to see what the documents discuss.


Headquartered in Gaithersburg, MD, Lockheed Martin's Integrated Systems & Solutions (IS&S) was formed in June 2003 in response to the increasing demand for solutions that promise a comprehensive, real-time information picture for faster, better informed decisions. Developed with over 20 years of Lockheed Martin experience, AeroText is a high performance data extraction engine and development environment that worldwide companies and governments use to find and correlate relevant information in text documents.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues