Get the early bird discount when you register now for KMWorld 2017 in Washington DC

Adopting and adapting existing word lists and taxonomies

   Bookmark and Share

In traditional taxonomy construction, you approach the body of knowledge, head for your ivory tower with its closed room, and decide on the general structure of the field or discipline. From that single point, or perhaps multiple points, of knowledge, you are going to design a completely new taxonomy … and every now and then that works.

You might build on some previous knowledge, find similar fields, generally current word lists, preferably from your own data. Compare the lists linguistically if you have multiple lists, and then choose the terms that the users choose. Now it might be, particularly if you are doing a corporate taxonomy, that you have some distinct sets of users. Those would be, for instance, the bench chemists and the marketing people, or the human resources people and the engineers. So they might be looking at the same body of data, but they use different terms to talk about it. You need to think about whether you need to have multiple views for those different sets of users, or some other way of accommodating variant vocabularies.

You might have a multinational corporation. Perhaps everyone speaks English, but some of them speak British and some of them speak American. Though the words sound similar, they are not always spelled the same. Different cognates are used to discuss individual items. You need to decide which language will be the ascendant language for the firm, and then make the others synonyms so that you can serve the entire corporation.

This is more often a problem than you would think. People think … it’s going to be English. Which one? A while back, we were working for a big multinational corporation, and for the most part the English and Spanish parts were pretty easy. The British and the American were not as easy in that particular thesaurus.

You might adapt a thesaurus. If you are adapting it, consider looking at Knowledge Organization, a journal that comes out quarterly from the International Society for Knowledge Organization (ISKO). It often lists schemes and thesauri.

The University of Toronto Library has a print collection of English language thesauri, the Subject Analysis Systems (SAS) collection, which can be found on the fifth floor of the library.

ASLIB in the UK (which used to be the Association of Special Libraries and Information Bureau and is now the Association for Information Management), has an Information Resource Center and has published a number of useful books. My favorite is the Aitchison, Gilchrist and Bawden book titled Thesaurus Construction and Use: A Practical Manual.

The American Society for Indexing has a list on their site of thesauri and tools for building thesauri. This is a group that used to be composed mainly of back-of-the-book style indexers, and they are, to some extent, reinventing themselves as database indexers and taxonomists.

Other resources include several terminology registries. One of these is, which allows people to contribute their thesaurus information and even entire thesauri, particularly if they are re-usable under a Creative Commons license or something similar. Several information management students have contributed interesting thesauri to this. There are all kinds of thesauri listed, from the National Library of Medicine ones to the thesaurus on belly dancing and the thesaurus on ship building. There’s plenty of variety included, and it’s a good place to start looking for something that you might be able to adapt to your needs.

OCLS has a terminologies service and lists several thesauri. Taxonomy Warehouse, sponsored by a Synaptica, is another resource that lists quite a few taxonomies that can be used.

If you are in the biology area, you might be interested in BioPortal, an ontology repository that provides viewer access to some very detailed biology-oriented systems.

There are many places to look for inspiration for your own taxonomy or thesaurus. I’ve described just a few of the available resources.