-->

KMWorld 2024 Is Nov. 18-21 in Washington, DC. Register now for Super Early Bird Savings!

Content-Categorization in the Enterprise

In the Roman Empire, all roads led to Rome. Although that's no longer true, it's still a good metaphor for content management systems.

Content management's primary focus is to make things easy to find, most often so you can take action. The main challenge to findability is anticipating how someone might look for information; what "road" they think will lead to the "Rome" they're seeking. That's where categorization comes into play.

The quality of the categorization of each piece of content makes or breaks its findability. Theoretically, good tagging will last the lifetime of the content; do it well initially, and you can forget about it until it's time to retire that piece. Right from the beginning, though, the reality is very different.

Durable Categorization

Many pressures complicate keeping the content house in order. These include:

  • The sheer volume, velocity and variety of internal and outward-facing content needing management;
  • Evolving/emerging regulation and compliance issues, some of which need to be retroactively applied; and
  • The need to limit the company's exposure and to support the strength of its position in any legal activity.

Some managers face the added challenge of integrating content from acquisitions or mergers, which most likely use content management structure, categorization and methodologies that are incompatible and of inconsistent quality.

Arguably, then, the most important success factors for good content management are the categorization techniques and processes themselves. Traditionally, a taxonomy system categorizes content using keywords, lexicons, dictionaries, thesauri, etc. This type of taxonomy model poses several problems:

Taxonomy quality—depends on the initial vision or attention to detail, and whether it's been kept current.

Term creep—initial categorization will not always accommodate where and how the content will be used over time, or predict relevancy beyond its original focus.

Policy evolution—it can't easily apply new or evolving policies, regulations, compliance requirements, etc.

Cost and complexity—it's difficult and costly, if not practically impossible, to retroactively expand the original categorization of the existing corpus.

Automatic Categorization by Example

Using technology to automatically categorize content certainly is appealing. It applies the rules more consistently than people do. It does it faster. It frees people from having to do the task (and hence, costs less). And, it can actively or retroactively categorize batches or whole collections of documents. So how can you experience these benefits and more without the drawbacks and limitations of traditional taxonomy systems?

One answer is concept-based categorization driven by an analytics engine integrated into the content management environment. These systems mathematically analyze example documents you provide to calculate concepts that can be used to categorize other documents. Identifying hundreds of dimensions per term, they are able to distinguish relevance that escapes keyword and other traditional taxonomy approaches. They are even highly likely to make connections that a person would miss.

Consider 3D printers as an example. These are also known as "materials printers," "fabbers," "3D fabbers," and as "additive manufacturing." If all of those terms aren't in the taxonomy, then relevant documents that use one or more of them, but not 3D printer, wouldn't be optimally categorized. People seeking information about 3D printers who aren't aware of the alternative terms would miss related documents of potential significance. This particularly impacts outward-facing, for-profit sites, such as those of infomediary companies. Their business depends on fast and easy delivery of accurate and complete information to their customers, even when the customer doesn't know all of the various terms used to describe the target subject.

In contrast, through example-based mathematical analysis and comparison along multiple dimensions, conceptual analytics understands that these documents are all related. They would be automatically categorized and tagged as relevant to 3D printing (as well as all the other terms).

Another difference is that taxonomy systems require someone to enter the newly developed or discovered terms. In conceptual analytics, it's simply a matter of providing additional example documents that automatically add to the system's conceptual understanding.

The days of keeping everything "just in case" are long gone. From cost and exposure viewpoints, companies need to keep only what's necessary, particularly as the volume and variety of content continue to accelerate. Good categorization and tagging systems are essential to good housekeeping and to controlling expense and exposure.

Outdated documents and tangential chit-chat bloat every company's content repositories.  Multiple copies of the same or very similar content are scattered throughout the organization. By some estimates, these compose upwards of 20% or more of a company's content. Efficiently weeding out that content means 20% less active and backup storage, bandwidth, cloud storage for offsite disaster recovery, and archive volume. Effective and thorough tagging can identify such elements to reduce these costs, and simultaneously reduce the company's cost and exposure related to legal or regulatory activity.

The Value Beyond Cost Savings

An effectively managed body of content delivers measurably better cost of ownership and reduced exposure. While this alone is reason enough to implement improvements in categorization, it's only part of the story.

Superior categorization through conceptual analysis also affects operational efficiency by enabling fast, accurate and complete content gathering. A significant benefit for any enterprise—it allows more time for actual work by reducing the time it takes to find necessary information—it's of critical importance for infomediary companies whose revenue depends on their customers quickly and easily finding quality information.

Conceptual analytics delivers two other advantages over traditional taxonomy methods and manual categorization. First, it creates a mathematical index, not prose, so it is useless to anyone trying to discover private information or clues about the company. Second, it is deterministic and repeatable—it will give the same result every time—and so is defensible in legal or regulatory activity.

Concept-based analysis makes content findable and actionable, regardless of language, by automatically categorizing it based on understanding developed from example documents you provide. Both internally and outward-facing, the company becomes more competitive with one of its most important assets-unstructured information.


Content Analyst's software provides advanced, conceptual-based search, classification and document analysis. For more information on the capabilities and value of advanced analytics, visit ContentAnalyst.com.



KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues