Book Notes: The Accidental Taxonomist

It takes a special kind of nerd to go out on one’s own (very limited) free time to hunt down a book about taxonomy “for funsies.” That is exactly what I’ve done.

“Taxonomies are chic.” – Tom Koulopoulos

Cover of The Accidental Taxonomist by Heather Hedden

The book’s author, Heather Hedden, explains that the majority of taxonomists are  people who stumble into creating taxonomies rather than intentionally entering the field. I remember first learning about taxonomy in biology class as an undergrad and thinking it was really cool that there were people whose sole job was to classify and order information. Since I have gotten into technical writing, I have discovered that I’m more and more interested in the way it works. With the explosion of information, there is an increasing interest in ways to organize all of “our stuff.” That in mind, Hedden does an excellent job of dipping our toes into this complex field.

Chapters cover: creating terms; creating relationships; software for taxonomy creation and management; taxonomies for human indexing; taxonomies for automated indexing; taxonomy structures; taxonomy displays; taxonomy planning, design, and creation; taxonomy implementation and evolution; and taxonomy as a profession.

Chapter 1: What Are Taxonomies?

“The term taxonomy is used both in the narrow sense, to mean a hierarchical classification or categorization system, and in the broad sense, in reference to any means of organizing concepts of knowledge.” p.1

“In the broader sense, a taxonomy may also be referred to as a knowledge organization system or knowledge organization structure.” p.1

Knowledge organization systems:
1. Term lists (authority files, glossaries, dictionaries, and gazetteers)
2. Classifications and categories (subject headings, classification schemes, taxonomies, and categorization schemes)
3. Relationship lists (thesauri, semantic networks, and ontologies)

Controlled vocabularies

controlled vocabulary is a restricted list of words or terms for some specialized purpose, usually for indexing, labeling, or categorizing. Only terms from the list may be used for the subject area covered. If it is used by more than one person, it is also controlled in the sense that there is control over who may add terms to the list and when and how they may do it. The list may grow, but only under defined policies.

“When implemented in search or browse systems, the controlled vocabulary can help guide the user to where the desired information is. While controlled vocabularies are most often used in indexing or tagging, they are also used in technical writing to ensure the use of consistent language.” p.3

synonym ring, or synset, is a simple controlled vocabulary in which all equivalent terms (synonyms) of a concept have equal standing and no single term is designated as preferred. Another name is a synset. A synonym ring is thus not displayed to the end user, and usually there are no other relationships (hierarchical or associative) between concepts.

“Sometimes controlled vocabularies are referred to as authority files, especially if they contain just named entities. Named entities are proper-noun terms.” p.4

Hierarchical Taxonomies

“A hierarchical taxonomy is a kind of controlled vocabulary in which each term is connected to a designated broader term (unless it is the top-level term) and one or more narrower terms (unless it is a bottom level term), and all the terms are organized into a single large hierarchical structure.” p.6

“If the taxonomy is displayed as a tree, it is an upside-down tree….[a]nother way to describe such structure is a taxonomy with nested categories. The expression to drill down is often used to describe how a user navigates down through the branches.” p.6

Examples of hierarchical taxonomy:

  • The Linnaean taxonomy (biological organisms, with the hierarchical top-down structure: kingdom, phylum, class, order, family, genus, and species.
  • The Dewey Decimal Classification system (for cataloging books)
  • The Standard Industrial Classification (SIC) and North American Industrial Classification Systems (NAICS) (for classifying industries)

“A collection of controlled vocabulary terms organized into a hierarchical structure…it is a kind of taxonomy that is commonly seen in countless real-world applications. And it is the type of taxonomy that the accidental taxonomist is probably most likely to create.” p.8

Thesauri

“A structured type of controlled vocabulary that provides information about each term and its relationships to other terms within the same thesaurus. Typical relationships are equivalent (use/used for), hierarchical (broader term.narrower term), and associative (related term), and term notes are a common feature. Published standards provide guidance on creating knowledge organization thesauri.” p.396

“In contrast to a hierarchical taxonomy, which is designed for user navigation from the top down, a thesaurus with multiple means of access can more easily contain a greater number of terms. Thus, a thesaurus may be able to support more granular (specific) and extensive indexing than a simple hierarchical taxonomy can, especially if the hierarchical taxonomy lacks non-preferred terms.” p.11

Ontologies

“An ontology aims to describe a domain of knowledge, a subject area, by both its terms (called individuals or instances) and their relationships and thus supports inferencing. This objective of a more complex and complete representation of knowledge stems from the etymology of the word ontology, which originally meant the study of the nature of being, or existence….The relationships between terms within an ontology are not limited to broader/narrower and related. Rather, there can be any number of domain-specific types of relationship pairs, such as owns/belongs to, produces/is produced by, and has members/is a member of.” p.12

“All of these components of an ontology–semantic relationships, attributes (for each of the terms/instances), and classes–contribute to making an ontology a richer source of information than a mere hierarchical taxonomy or thesaurus.” p.13

“There is also a growing importance of ontologies in semantic search engine deployment in specialized industries, and building ontologies could be a growth area for experience taxonomists. In 2009, a new organization for supporting ontologies, the International Association for Ontology and Its Applications (www.iaoa.org), was founded.” p.14

Taxonomies serve primarily one of the following three functions:

  1. Indexing support
  2. Retrieval support
  3. Organization and navigation support

1. “Because indexers must always choose the most accurate terms, they often use a more structured thesaurus type of controlled vocabulary. The broader, narrower, and related term relationships guide the indexer to the best term, and scope notes further clarify ambiguous terms.” p.16

“Library of Congress Subject Headings (LCSH; authorities.loc.gov) contains both subjects and names, and covers all subject areas.” p.16

Authority file “is another name for a controlled vocabulary, especially if used just for named entities and restricted to a certain kind of entity (person names, company names, place names, etc.). As such, an authority file lacks the interterm relationships of a thesaurus but may have multiple non-preferred terms for each preferred term. Also called an authority list.” p.381

2. “A taxonomy that serves indexing also serves end-user retrieval.” p.17

“These types of controlled vocabularies are often used with site search engines, enterprise search systems (used internally within a large organization), online databases, and large commercial directories. The format is always electronic, and a form of automated indexing is usually involved.

facet is a categorical grouping of terms in a taxonomy that cover a single dimension of a complex query for an item being searched. Multiple terms, one from each facet, are searched in combination to retrieve the most specific data records. A facet is typically its own hierarchy, but not all separate hierarchies are facets. p.385

3. “A taxonomy, as a hierarchy, can provide a categorization or classification system for things or for information. For the organization of information, we often see taxonomies applied in website information architecture (structural design), online information services, intranet content organization, and corporate content management systems.” p.21

“The taxonomy for a website is a lot like a table of contents, organized by topic. It can be reflected in the navigational menu and in the site map. As such, it might be called a navigational taxonomy.” p.22

“The largest (and the only multi-source) directory of taxonomies available for use is Taxonomy Warehouse (www.taxonomywarehouse.com). The database includes hundreds of taxonomies.” p.25

“Formats may vary, but typically, taxonomies or thesauri that are made available for other uses are formatted in some kind of XML whereby all terms, relationships, nonpreferred terms, scope notes, and so forth are retained when they are imported into other taxonomy management systems.” p.25

“The word thesaurus was first used to refer to a controlled vocabulary for information retrieval purposes by Peter Luhn at IBM in 1957.” p.29

“The emergence and growth of the web in the 1990s was a major contributing factor in the growing interest in taxonomies, for several reasons. The web enabled smaller publishers to offer online information services. Companies started developing intranets that quickly expanded in size and required better navigation and search.” p.30

“Although newer buzzwords, such as folksonomy, social networking, and Web 2.0, have superseded taxonomy in the 2000s, a sustained interest in taxonomy and taxonomists continues.” p.35

Keywords: information organization, classification, indexing, subject headings, cross references (information retrieval) thesauri, title

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s