Solving the Metadata Mystery

What is metadata? The simplest way to explain metadata is to envision a scenario without it. Imagine the chaos at your local supermarket if all of the product labels were removed. A trip that should take a couple of minutes would now take hours—you would be forced to open up every individual product and decipher its contents.

Metadata functions like labels. It describes the attributes and contents of separate objects. Without it, grouping similar items or searching for specific objects would be almost impossible.

Using information management terminology, metadata is data about data. Metadata refers to a structured set of elements used to describe an information resource and its intellectual property rights. The elements used should assist in the identification, location and retrieval of information resources by end-users. Metadata allows end-users to locate and determine the usefulness of data without having to access the original data.

With the emergence of the Web as a data warehouse, metadata has heightened in importance. Without metadata, Web storage is just an ocean of poorly catalogued information. Metadata serves several functions. It acts as a representative of a larger whole and then must characterize the original work sufficiently for the user to understand its contents, purpose and source.

Standard Structure & Terminology In order for a metadata system to function successfully, it must establish standard structure and terminology. Data fields that serve the same function are worthless unless they can be mapped to signify the same concept. One form needs to be established, through either an automatic authority list or a standard controlled vocabulary, and then links need to map alternative forms to this established form.

Metadata plays a vital role in document and records management. It is metadata that provides users of an organization's documents and records the key to understanding the context of a document. Without accurate metadata, the task of managing documents and records becomes hopelessly complicated. Effective electronic document and record management systems should automatically capture metadata (derived from applications and/or the operating system) and then encourage the end-user to add as much metadata as is necessary to accurately describe the document. When containing a document to a virtual file/folder, which has already been well described (via metadata), little additional metadata will be required at the document level, saving the end-user valuable time and effort. Virtual files/folders provide all of the utility and efficiency of their paper predecessors; metadata invested in the file/folder is effectively inherited by all the documents it contains. It is important to note that "this inheritance" of metadata does not mean explicitly copying the file/folder's metadata to each of the documents (to do so would lead to retrieval static and devalue the metadata of the file/folder). This benefit allows users to retrieve on the metadata associated on the file and navigate to its contents—supporting accurate and efficient search in large information stores. The speed of an organization's information retrieval system is based in part on the accuracy of its metadata. If an information management system ensures that its end-users are using accurate and consistent metadata, this organization can be confident that its users can find relevant information quickly.

One of the greatest challenges for organizations implementing a document or record management system is ensuring that metadata are recorded and that their entry stays consistent between all users and applications.

This challenge can be met by limiting the field choice available to users through a controlled vocabulary. Information management systems should allow organizations to configure the metadata scheme available to users, but must also include the capabilities to limit their choice.

Dublin Core

The Dublin Core Metadata Initiative is one attempt to address the control problems organizations face when they create a standard set of elements to record the metadata associated with web resources. The Dublin Core metadata elements are specifically intended to support resource discovery. The elements represent a broad, interdisciplinary consensus and reflect the core set of elements likely to be needed to support resource discovery. The Dublin Core Metadata Initiative began in 1995 to develop conventions for resource discovery on the Web. The focus of discussion was electronic resources. It was clear at the outset, however, that the semantics of resource discovery should be independent of the medium of the resource, and that there are obvious advantages for using the same semantic model across media. Thus, considerable attention has been invested in making the Dublin Core sufficiently flexible to represent resources (and relationships among resources) that are both digital and exist in traditional formats as well. The idea behind Dublin Core and other metadata standards is to facilitate the transfer of objects between systems—objects described using one metadata format can be easily imported into another system that understands the format.

The 15 Dublin Core Elements are: Title; Creator; Subject; Description; Publisher; Contributor; Date; Type; Format; Identifier; Source; Language; Relation; Coverage; and Rights.

These 15 elements may have qualifiers added to them to provide additional information. For example, "date" may be further identified by "date created," "date last modified" or "date published" etc. Qualifiers can allow increased specificity or precision of the metadata, but also introduce complexity that can impair interoperability. Designers of metadata systems are advised to be conservative about deploying qualifiers in cases where the interoperability is one of the design objectives. For example, if the plan is to output the data to XML so that it can be used in another application, the users need to be careful to use only metadata terms that will be meaningful to the new application. Dublin Core interoperability qualifiers are those that have been approved in the Dublin Core community and are a formal part of the registry of Dublin Core metadata. Designers should select qualifiers that come from this set to the extent they meet the functional requirements for a given application. It is worthwhile remembering that any resource may have any number (including none) of these 15 metadata elements.

The key issue is not how many metadata tags a user puts on a document (although the more information that is registered the easier the resource will be to find). What is most important is that all users use the tags consistently.

No information management system can specifically support all of the various metadata standards—this would create a cumbersome database with overlapped functionality. Information management systems must strive to provide enough fields to accommodate the metadata required and allow the organization to label the fields accordingly.

Maximizing Metadata

After an organization clears the hurdle of implementing a standardized system of metadata labeling, the improvements to its workflow will be noticed immediately. The consistent assigning of metadata to an organization's documents/folders/information provides a complete picture of its data sources to its staff. Experience in search analysis suggests that most end users of an information management system will search by title (metadata), date created (metadata), record number (metadata) and document content. For most users, document content searching is used as an adjunct to other search methods, rather than as their primary method.

Structured metadata gathering not only has tangible benefits to the organization, but also ensures the usefulness of metadata searches to the staff, by maximizing the accuracy of the returned results.

Metadata gives organizations the ability to create relationships between information points, allow staff mobility, maintain information for appropriate time periods, rapidly implement corporate and business rules (for example, classification and retention guidelines) and reduce search and retrieval time because search parameters can be refined, saved and combined with document content searches. These capabilities equate to improved response times, quicker time to market, compliance with statutory, legislative, legal, quality and audit requirements (for example, Sarbannes Oxley Act and FOI Act) and a more informed decision making process.

Accurate metadata maximizes the use of an organization's information by its staff. Staff spend less times searching for information, can apply standard terminology to varied data, classify information in its context including relationships, have the ability to add abstracts/notes, have the capability to reuse common search parameters, and can assign business processes to activities and documents. Metadata is simply data about data, but used effectively it transforms information into knowledge.

TRIM Context

TOWER Software's electronic record and document management product TRIM Context is an information management solution, which has been produced by people who understand the difficulties in implementing a metadata scheme.

TRIM Context provides a superset of metadata tags. TRIM Context has no practical limit to the number of user-defined fields available for attaching appropriate metadata to business records. Hence, there is no difficulty in satisfying any metadata standards.

TOWER Software was involved in the development of international standards and keeps up to date on their impact on the document management and record keeping marketplace. Standards such as:

US DoD 5015.2 (Records management standard for US Department of Defense); and

Public Records Office Electronic Records in Office Systems Project (UK).

TRIM Context has been certified against these standards. TRIM Context is a complete, off-the-shelf solution for any organization's information management needs. TRIM Context enables:

enterprise wide electronic document management;

Web-based document management;

knowledge management;

records management;

workflow;

imaging; and

electronic record keeping.

Tower Software delivers electronic document and records management (EDRM) solutions, empowering organizations to take control of their corporate information assets. TOWER Software's award-winning TRIM Context® solution is a single, integrated platform that manages business information throughout its complete lifecycle. By relying on its proven domain expertise, strong strategic partnerships and powerful solutions, TOWER Software enables organizations to maintain accuracy, maximize efficiency and achieve and maintain standards compliance across industries, resulting in sustained competitive advantage. TOWER Software is a privately held company with operations in North America, Europe and Asia-Pacific. For more information, visit TOWER Software, or TOWER Software Asia Pacific, TOWER Software North America: www.towersoft.com;TOWER Software Europe, Middle East & Africa

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Solving the Metadata Mystery

How Knowledge Graphs Make Generative AI Consumable in Enterprise Environments

Building a KM Foundation for Enterprise AI

TRANSFORMING ENTERPRISE KNOWLEDGE: THE JOURNEY TO SAFE, SECURE, AND TRUSTWORTHY AI

More

Intelligent Content Management: Game-Changing Technologies and Strategies

Optimizing LLMs with RAG: Key Technologies and Best Practices

What's Ahead in Search: AI, NLP, Knowledge Graphs, and More

Rethinking KM for Agility, Efficiency, and Innovation

More Webinars