Unlocking enterprise data: Metadata holds the key
With sales of about 300,000 vehicles per year and a network of more than 700 car and truck dealers, Mazda North America has major, ongoing requirements for searching its relational databases to support a variety of business initiatives. However, the information systems that underlie the company's sales and service operations for vehicles and parts are numerous and diverse.
For example, a mainframe DB2 database supports transactions, while an SQL Server database supports marketing campaigns and reports. Departmental systems may use Microsoft Access databases. Routine reports can be set up and run automatically, but ad hoc queries are a different matter, particularly if information from multiple databases needs to be integrated in order to get an answer. In addition, sometimes data elements such as identification codes need to be modified, and in order to do so correctly, all of the locations of the elements must be found.
Mazda North America began using MetaTrieve, an enterprisewide search engine and query tool for relational databases and other structured data sources, from MetaTrieval Software to address that challenge. MetaTrieve is designed to locate relevant metadata on a vendor-neutral basis across multiple, disparate databases using natural language search. Once the sources are located, MetaTrieve can display the data so the user can verify that the desired database has been found. It can also query the database and present the results within the application, or download them into Excel or Access.
MetaTrieve has made the query process much more efficient. Prior to implementing MetaTrieve, users had to resort to a database catalog and browse through metadata in order to find the database and elements needed for the query.
"We save a tremendous amount of time with MetaTrieve," says Gail Hockerman, parts systems manager at Mazda North America. "Our various data systems grew up over time, and the naming conventions across the databases are not the same. But we can enter a field name and MetaTrieve will locate all the similar names, so we can be sure we have found all related fields."
Finding metadata is also critical when modifications are made to databases. For example, Mazda's dealers are grouped by geographic region. In order to carry out some marketing programs, the dealers may need to be regrouped. "MetaTrieve helps us find all the tables where this data is, which lets us know which systems are involved," Hockerman says. Then the regional codes can be changed to support the new campaign.
MetaTrieve's "find" feature allows non-technical users to locate the tables that may be relevant to the metadata in the natural language search. Those search algorithms also assign a confidence level to each set of results, which can guide the user to the most likely tables. Although the user is then faced with field names that are sometimes cryptic, the tables can be opened and the data examined to see if it contains the desired information. Data in the tables can then be queried, and users do not need to know how to write an SQL query in order to get results.
"Existing query tools do not solve the metadata search/data discovery problem; they only allow you, using SQL, to filter through a database table if you already know that the data relevant to your assignment is there," says Mark Reiners, co-founder of MetaTrieval. "But many company database implementations, individual or departmentally distributed, may have thousands of tables. As the size or distribution of databases increases, the complexity of preparing conventional, static intermediaries such as data dictionaries and indexes will also increase, and their adequacy as a solution will decrease."
One emerging technology that could significantly add to the amount of corporate data is the proliferation of radio frequency identification (RFID) technology. RFID is now being used in supply chain management by some large retailers, most notably Wal-Mart. "As RFID rolls out, the amount of information that companies must track and analyze will become enormous," Reiners says. "In addition, business rules associated with RFID could cause changes in the workflow, making databases far more dynamic." Therefore, the discovery of metadata will become both more important and more difficult.
The significance of finding all related metadata extends beyond the query process, though. Federated searches, virtual data warehouses and real-time integration all rely on accurate discovery and marking of the data. So the potential payoff of understanding and being able to locate enterprise metadata is large.
Another approach to managing the metadata that helps locate corporate information assets is to centralize it in a repository. MetaCenter from Data Advantage Group creates a catalog in real time of metadata from distributed IT applications.
"Ideally, the business definitions for each element should be defined at a conceptual level," says Geoff Rayner, CEO of Data Advantage Group. "Then the definitions can be related to a single instance of that element and linked to other instances that might be stored under a different name." MetaCenter can be pointed at different data sources, which it quickly indexes to create its own extensible repository.
Fidelity Investments wanted a way to locate key information sources such as reports and standard business definitions. MetaCenter is used as the reference point for that information. Users access it through a cross-indexed, searchable knowledgebase that is published as the Finance Data Library.
"This system has replaced a lot of spreadsheet documentation, which was not portable, not easily searched and lacked robust security," Rayner says. MetaCenter is also used by other customers, such as Allmerica Financial , to store other types of metadata, such as information about who owns a database, what server is hosting that schema and how business definitions relate to data definitions stored in different systems.
Well-managed metadata opens up many possibilities that would not otherwise exist, Rayner explains. For example, a database table can be linked to a set of business intelligence reports, or HR documents can be linked to Web sites containing supporting information. Because metadata is extensible, it can be customized to a particular company's needs in a way that is not always feasible in traditional software applications. In addition, metadata management systems can be audited to highlight problems, such as incomplete actions, that occur during business processes. Once that information is captured, it can be analyzed to see what other processes may have been affected.
Both MetaTrieve and MetaCenter offer ways of adding value through effective use of metadata to locate enterprise data, although their strategies are different.
"MetaTrieve uses a ‘bottom-up' approach, while MetaCenter employs a ‘top-down' strategy," says Stu Carty, president and founder of Gavilan Research Associates, a research firm specializing in business intelligence, data warehousing and metadata management products. "Most Global 5000 companies are looking for better search and discovery products at the enterprise level," Carty says. Such products enable people to find information assets such as key business reports, corporate data or other information in a self-service manner.
"If you ask people to describe their pain," Carty adds, "they say they want a ‘Google-like' product that can search their corporate information assets, as well as a ‘MapQuest' to see how information is interconnected and interrelated." Metadata management helps achieve both of those objectives.
Organizations will continue to struggle with an increasing volume of data stemming from compliance requirements, business operations or new input from technologies such as RFID. Getting a handle on metadata and using it effectively will become more critical, and a proactive role in exploring options will offer better results than waiting for problems to develop.
A search for information that is based on metadata is only as good as the metadata that is created in the first place. With more documents being stored in XML, the importance of generating meaningful metadata is increasing. XMetaL from Blast Radius captures metadata easily and intuitively in a non-intrusive way as a document is written. "The best person to decide on metadata is the document author, and the best time to capture metadata is when a document is being created," says Michael Fergusson, VP of product strategy at Blast Radius. "In fact, if the metadata is not captured then, chances are it won't ever be."
Some documents require a much more detailed structure than others, with specialized semantics. "Titles and paragraphs might make sense for articles, but in an application for FDA drug approval, the semantics are more specific," Fergusson says. "Metadata that indicates a numbered list may not be enough—it might need to reflect an ordered procedure, which is a specialized case of a list." If users later want to find all the drugs that can be taken after a meal, for example, the search will be impossible if the semantics implied by that query have not been captured. It is the author of the document who understands the context of the document and its contents, and who is in the best position to determine the appropriate metadata.
XMetaL provides tools for subject matter experts to create intelligent documents that have the semantics and structure of their domain encoded in the document or stored in repositories beside them. The company works closely with repository vendors such as EMC Documentum and Interwoven to ensure compatibility between their products and XMetaL.
Judith Lamont is a research analyst with Zentek Corp., e-mail firstname.lastname@example.org.