Book Notes: The Accidental Taxonomist

It takes a special kind of nerd to go out on one’s own (very limited) free time to hunt down a book about taxonomy “for funsies.” That is exactly what I’ve done.

“Taxonomies are chic.” – Tom Koulopoulos

Cover of The Accidental Taxonomist by Heather Hedden

The book’s author, Heather Hedden, explains that the majority of taxonomists are  people who stumble into creating taxonomies rather than intentionally entering the field. I remember first learning about taxonomy in biology class as an undergrad and thinking it was really cool that there were people whose sole job was to classify and order information. Since I have gotten into technical writing, I have discovered that I’m more and more interested in the way it works. With the explosion of information, there is an increasing interest in ways to organize all of “our stuff.” That in mind, Hedden does an excellent job of dipping our toes into this complex field.

Chapters cover: creating terms; creating relationships; software for taxonomy creation and management; taxonomies for human indexing; taxonomies for automated indexing; taxonomy structures; taxonomy displays; taxonomy planning, design, and creation; taxonomy implementation and evolution; and taxonomy as a profession.

Chapter 1: What Are Taxonomies?

“The term taxonomy is used both in the narrow sense, to mean a hierarchical classification or categorization system, and in the broad sense, in reference to any means of organizing concepts of knowledge.” p.1

“In the broader sense, a taxonomy may also be referred to as a knowledge organization system or knowledge organization structure.” p.1

Knowledge organization systems:
1. Term lists (authority files, glossaries, dictionaries, and gazetteers)
2. Classifications and categories (subject headings, classification schemes, taxonomies, and categorization schemes)
3. Relationship lists (thesauri, semantic networks, and ontologies)

Controlled vocabularies

controlled vocabulary is a restricted list of words or terms for some specialized purpose, usually for indexing, labeling, or categorizing. Only terms from the list may be used for the subject area covered. If it is used by more than one person, it is also controlled in the sense that there is control over who may add terms to the list and when and how they may do it. The list may grow, but only under defined policies.

“When implemented in search or browse systems, the controlled vocabulary can help guide the user to where the desired information is. While controlled vocabularies are most often used in indexing or tagging, they are also used in technical writing to ensure the use of consistent language.” p.3

synonym ring, or synset, is a simple controlled vocabulary in which all equivalent terms (synonyms) of a concept have equal standing and no single term is designated as preferred. Another name is a synset. A synonym ring is thus not displayed to the end user, and usually there are no other relationships (hierarchical or associative) between concepts.

“Sometimes controlled vocabularies are referred to as authority files, especially if they contain just named entities. Named entities are proper-noun terms.” p.4

Hierarchical Taxonomies

“A hierarchical taxonomy is a kind of controlled vocabulary in which each term is connected to a designated broader term (unless it is the top-level term) and one or more narrower terms (unless it is a bottom level term), and all the terms are organized into a single large hierarchical structure.” p.6

“If the taxonomy is displayed as a tree, it is an upside-down tree….[a]nother way to describe such structure is a taxonomy with nested categories. The expression to drill down is often used to describe how a user navigates down through the branches.” p.6

Examples of hierarchical taxonomy:

  • The Linnaean taxonomy (biological organisms, with the hierarchical top-down structure: kingdom, phylum, class, order, family, genus, and species.
  • The Dewey Decimal Classification system (for cataloging books)
  • The Standard Industrial Classification (SIC) and North American Industrial Classification Systems (NAICS) (for classifying industries)

“A collection of controlled vocabulary terms organized into a hierarchical structure…it is a kind of taxonomy that is commonly seen in countless real-world applications. And it is the type of taxonomy that the accidental taxonomist is probably most likely to create.” p.8


“A structured type of controlled vocabulary that provides information about each term and its relationships to other terms within the same thesaurus. Typical relationships are equivalent (use/used for), hierarchical (broader term.narrower term), and associative (related term), and term notes are a common feature. Published standards provide guidance on creating knowledge organization thesauri.” p.396

“In contrast to a hierarchical taxonomy, which is designed for user navigation from the top down, a thesaurus with multiple means of access can more easily contain a greater number of terms. Thus, a thesaurus may be able to support more granular (specific) and extensive indexing than a simple hierarchical taxonomy can, especially if the hierarchical taxonomy lacks non-preferred terms.” p.11


“An ontology aims to describe a domain of knowledge, a subject area, by both its terms (called individuals or instances) and their relationships and thus supports inferencing. This objective of a more complex and complete representation of knowledge stems from the etymology of the word ontology, which originally meant the study of the nature of being, or existence….The relationships between terms within an ontology are not limited to broader/narrower and related. Rather, there can be any number of domain-specific types of relationship pairs, such as owns/belongs to, produces/is produced by, and has members/is a member of.” p.12

“All of these components of an ontology–semantic relationships, attributes (for each of the terms/instances), and classes–contribute to making an ontology a richer source of information than a mere hierarchical taxonomy or thesaurus.” p.13

“There is also a growing importance of ontologies in semantic search engine deployment in specialized industries, and building ontologies could be a growth area for experience taxonomists. In 2009, a new organization for supporting ontologies, the International Association for Ontology and Its Applications (, was founded.” p.14

Taxonomies serve primarily one of the following three functions:

  1. Indexing support
  2. Retrieval support
  3. Organization and navigation support

1. “Because indexers must always choose the most accurate terms, they often use a more structured thesaurus type of controlled vocabulary. The broader, narrower, and related term relationships guide the indexer to the best term, and scope notes further clarify ambiguous terms.” p.16

“Library of Congress Subject Headings (LCSH; contains both subjects and names, and covers all subject areas.” p.16

Authority file “is another name for a controlled vocabulary, especially if used just for named entities and restricted to a certain kind of entity (person names, company names, place names, etc.). As such, an authority file lacks the interterm relationships of a thesaurus but may have multiple non-preferred terms for each preferred term. Also called an authority list.” p.381

2. “A taxonomy that serves indexing also serves end-user retrieval.” p.17

“These types of controlled vocabularies are often used with site search engines, enterprise search systems (used internally within a large organization), online databases, and large commercial directories. The format is always electronic, and a form of automated indexing is usually involved.

facet is a categorical grouping of terms in a taxonomy that cover a single dimension of a complex query for an item being searched. Multiple terms, one from each facet, are searched in combination to retrieve the most specific data records. A facet is typically its own hierarchy, but not all separate hierarchies are facets. p.385

3. “A taxonomy, as a hierarchy, can provide a categorization or classification system for things or for information. For the organization of information, we often see taxonomies applied in website information architecture (structural design), online information services, intranet content organization, and corporate content management systems.” p.21

“The taxonomy for a website is a lot like a table of contents, organized by topic. It can be reflected in the navigational menu and in the site map. As such, it might be called a navigational taxonomy.” p.22

“The largest (and the only multi-source) directory of taxonomies available for use is Taxonomy Warehouse ( The database includes hundreds of taxonomies.” p.25

“Formats may vary, but typically, taxonomies or thesauri that are made available for other uses are formatted in some kind of XML whereby all terms, relationships, nonpreferred terms, scope notes, and so forth are retained when they are imported into other taxonomy management systems.” p.25

“The word thesaurus was first used to refer to a controlled vocabulary for information retrieval purposes by Peter Luhn at IBM in 1957.” p.29

“The emergence and growth of the web in the 1990s was a major contributing factor in the growing interest in taxonomies, for several reasons. The web enabled smaller publishers to offer online information services. Companies started developing intranets that quickly expanded in size and required better navigation and search.” p.30

“Although newer buzzwords, such as folksonomy, social networking, and Web 2.0, have superseded taxonomy in the 2000s, a sustained interest in taxonomy and taxonomists continues.” p.35

Keywords: information organization, classification, indexing, subject headings, cross references (information retrieval) thesauri


Style in Writing

I am working my way through Style: Lessons in Clarity and Grace by Joseph M. Williams and Gregory G. Colomb. I would recommend it for anyone who makes writing or editing their profession. It is a thin little book with 264 pages in total (including Appendices, Glossary, and Index) with twelve lessons in five parts that address style, clarity, grace, clarity of form, and ethics. More quotes to come, I’m sure.

“The problem is, we cannot judge our own writing as others will because we respond less to the words on the page than to the thoughts in our minds. We can avoid that solipsistic subjectivity only if we can figure out how what we have put on the page makes our readers feel as they do.” p. 7

“A warning: if you think of the principles offered here as rules to follow as you draft, you may never finish anything. Most experienced writers get something down on paper or up on the screen as fast as they can. Then as they revise that first draft into something clearer, they understand their ideas better. And when they understand their ideas better, they express them more clearly, and the more clearly they express them, the better they understand them…and so it goes, ending only when they run out of energy, interest, or time.” p. 8

“If writers whom we judge to be competent regularly violate some alleged rule and most careful readers never notice, then the rule has no force. In those cases, it is not writers who should change their usage, but grammarians who should change their rules.” p. 18

“You can’t predict good grammar or correct usage by logic or general rule. You have to learn the rules one-by-one and accept the fact that some of them, probably most of them, are arbitrary and idiosyncratic.” p. 23

“We distinguish these two kinds of sentences because readers can respond to them very differently: the one you are now reading for example, is one long punctuated sentence, but it is not as hard to read as many shorter sentences that consist of many subordinate clauses; I have chosen to punctuate as one long sentence what I might have punctuated as a series of shorter ones: that colon, those semicolons, and the comma before that but could have been periods, for example–and that dash could have been a period too.” p. 211

Grammar Rules

Simply because we are editors, that does not mean we are, or should be, “grammar nazis.”

While I do get a laugh at those grammar arguments that sometimes get surprisingly heated, I find that I usually have a side. We can get into a whole descriptivist vs. prescriptivist argument. This quote, from the late Joseph M. Williams, pretty much sums up my feelings:

“If writers whom we judge to be competent regularly violate some alleged rule and most careful readers never notice, then the rule has no force. In those cases, it is not writers who should change their usage, but grammarians who should change their rules.” -Joseph M. Williams, Style: Lessons in Clarity and Grace.

The Compulsive Editor

Excerpt from The Elements of Editing: A Modern Guide for Editors and Journalists by Arthur Plotnik (Macmillan, 1982)

Signs of a Dysfunctional (Editor-Related) Compulsiveness

  • Holding to favorite rules of usage, whatever the effect on communication
  • Musing for fifteen minutes on whether to use a hairline or one-point rule
  • Changing every passive construction to an active one
  • Concentrating on negative rather than positive space in layout

Signs of Functional (Reader-Oriented) Compulsiveness

  • Following up
  • Rewriting every headline that fails to motivate readership
  • Quadruple-checking of page proofs
  • Staring at type specifications a full ten seconds
  • Reading every word in its final context
  • As soon as one issue is put to bed, insisting that work begin on the next

Quotes on Editing

Quotes from Leslie T. Sharpe and Irene Gunther in Editing Fact and Fiction (Cambridge University Press, 1994)

“Editors can do harm primarily in two ways: when they alter an author’s individual style–her voice–or when they change the content or meaning of her prose. Doing no harm when editing a manuscript means doing the minimum necessary to clarify an author’s language or intent.”

“New editors, anxious to prove to their superiors that they have mastered the minutiae of grammar and usage, tend to overedit. But so, at times, do experienced editors, perhaps in an effort to validate the importance of their own function, or simply out of a failure to grasp what a writer is trying to accomplish.”

“Gratuitous editing and unnecessary rewriting are the most common complaints writers make about editors. And justifiably so. The editor’s job is to allow the author’s voice to emerge without coloring it, or replacing it with her own. An editor who wants to write should be a writer.”

“Editors are seen as the arbiters of taste, style, and usage, but good editors cannot be arbitrary. They must be flexible. Essentially what this means is that they listen: They never fail to take into account an author’s feelings.”


Welcome to Puget Sound Editing

Puget Sound Editing seeks to clarify your writing whether it is in print or for the web. Please feel free to click around in the tabs above to find what you’re looking for.

There are three levels of copyediting: light, medium, and heavy. In all three levels mechanical editing should be consistent (spelling, capitalization, punctuation, hyphenation, abbreviations, format of lists, etc.). What differs between the levels is the language and content editing.

a light copyedit:

  • corrects all errors in grammar, syntax, and usage
  • points out paragraphs that seem wordy, but does not revise
  • ignores minor patches of wordiness, imprecise wording, and jargon
  • asks for clarification of terms that might be new to readers
  • queries factual inconsistencies

a medium copyedit:

  • corrects all errors in grammar, syntax, and usage
  • points out any patches that seem wordy or convoluted, and supplies suggested revisions
  • asks for or supply definitions of terms likely to be new to readers
  • queries any facts that seem incorrect, uses desktop reference books to verify content
  • queries faulty organization and gaps in logic

a heavy copyedit:

  • corrects all errors and infelicities in grammar, syntax, and usage
  • rewrites any wordy or convoluted patches
  • asks for or supplies definitions of terms likely to be new to readers
  • verifies and revises any facts that are incorrect
  • queries or fixes faulty organization and gaps in logic