-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Avoiding Pitfalls in Enterprise Taxonomy Projects

According to Gartner, more than 70% of firms that invest in unstructured information management initiatives will not achieve their targeted return on investment due to underinvestment in taxonomy building1. The three biggest obstacles encountered when starting a taxonomy project are: (1.) expense, including time and resource cost factors; (2.) insufficient domain coverage; and (3.) working with a taxonomy that does not offer the necessary components for concept tagging. With proper planning, these pitfalls can be avoided.

Expense: Time = Money

Building taxonomies is expensive. A major search engine provider employs more than 200 corporate librarians for the purposes of building and updating their taxonomies. Most organizations cannot afford large librarian staff, so taxonomy development is typically assigned to an in-house information scientist or external consultants who are subject matter experts (SME) in the business. Regardless of whether an internal or external SME is used, it is estimated that manually creating and training just a single topic in a taxonomy requires roughly two to three hours and can cost as much as $300. Additionally, all enterprise taxonomies require regular maintenance as information and the business dynamics evolve. Manually developed taxonomies are a time-intensive and expensive endeavor.

Suggestion: Investigate the use of pre-built taxonomies. A number of organizations sell taxonomies, while some are publicly available for download. Some commercially available taxonomies contain several thousand topics and cover a variety of specialized subject areas that may be relevant to your business.

Commercially available taxonomies can provide taxonomy development teams with an excellent starting point and minimize the time and expense of developing them from scratch. Existing taxonomies may inevitably require some editorial modification, but they can significantly shorten the overall time to completion and expand the breadth of topical coverage. This approach will also reduce the encroachment of the SMEs' regular responsibilities.

Topical Coverage

Many projects break down when the taxonomy domain scope is too small and restrictive to cover the entire business effectively. Marketing, operations, legal, and R&D will all look at the same information through different lenses. A taxonomy should be broad enough to support the various operational dimensions of the knowledge community of the organization. A well-designed taxonomy must also be deep, meaning the subjects are defined in enough detail that the knowledge worker within the enterprise derives real value.

Many taxonomy projects fail when taxonomy teams short-circuit the thorough development of the domain in both depth and breadth...usually due to corporate timeline and resource drain. Typically, this results in performance and usability levels which are well below knowledge worker and management expectations. Studies have proven that the deepest, most granular knowledge-based taxonomies will provide the best search and categorization performance.

Suggestion: To ensure adequate topical coverage which supports the entire organization, use multiple taxonomies. Start with a general-domain taxonomy that broadly covers the key areas of the industry, and then layer in additional subject-specific taxonomies that correspond to the various operational viewpoints of the organization. Again, taking advantage of commercially available taxonomies will strongly enhance your ability to cover the required taxonomy areas and allow for the addition of new areas as the business changes.

Taxonomic Content

Make sure you understand what is required for your project's success. With the broad array of commercial offerings, terminology, tools, techniques and approaches, taxonomies can often lead to confusion, intimidation and frustration. The terms "thesauri," "controlled vocabularies," "taxonomies" and "ontologies" are often used interchangeably in any "taxonomy" dialog.

A taxonomy for use within an enterprise should have hierarchical structure which relates the categories and subcategories of a domain. A good taxonomy will also contain matching rules or pre-developed training sets. These rules will be essential for accurately tagging the concepts found in the unstructured data. If your organization participates in industry groups that make available or license industry-standard "taxonomies" such as MESH, ACT (related to life sciences) and DTIC (related to defense), be sure that these taxonomies will suit your enterprise's needs. Often, industry-driven taxonomies are developed at a high-level so as to support a broader group and rarely contain the requisite rule sets. Industry-developed taxonomies, nevertheless, may be valuable as one component of your taxonomy collection.

Suggestion: Get the most out of a taxonomy by insisting on matching rules or pre-developed training sets. If you leverage commercially available taxonomies, be sure the taxonomy provider clearly defines their offering. Ask for their definition of "taxonomy" and preview a sample of the matching rules. If possible, find a taxonomy provider who supplies multiple taxonomies to minimize import effort.

Enterprises have made major investments in unstructured information management systems. The key to empowering these systems is through implementing broad, yet rich taxonomies. Organizations can save time and money by leveraging commercially available taxonomies and allocating SMEs for the tailoring of these taxonomies to the business model. Using a collection of taxonomies will allow for cross-departmental viewpoint coverage. When purchasing taxonomies, understand the industry vernacular and insist on taxonomies with hierarchical structure and matching rules. Proper research and planning will help lead to a successful taxonomy implementation.


Intellisophic is the leading publisher of taxonomic content for understanding unstructured data. Intellisophic has created the largest collection of taxonomies covering several million topics and defined by hundreds of millions of terms. Intellisophic's solutions allow customers to categorize textual data based on meaning and context, adding structured concept attributes to unstructured data for business intelligence applications.

For more information please call 610.251.1077 or visit Intellisophic

1 Knox, Rita & Logan, Debra (2003, September 10). What Taxonomies Do for the Enterprise. Gartner Research, AV-20-8780

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues