Word counts matter for estimating the finished length of a book, whether an article will fit in the layout, and how much time (and money) we’ll need for editing and design and other production steps. But not all words are counted equally, and even more so, perhaps lengthy words like sesquipedalian or phosphofructokinase should be counted as more than equivalent to cat or it. Such words can make technical texts considerably longer than an equivalent number of words in a novel.
What Is a Word
Dictionaries are full of texts strings identified as words, but what a piece of software considers to be a word might surprise you. Some systems conjoin words separated by a slash or dash. Some don’t count words with numbers in them. Footnotes, running heads and footers, captions, and text in comments (or tracked changes) may or may not get counted.
Text segmentation rules for word boundaries (such as a space) and dividers are used by software define “a word.” but rules differ enough to find great discrepancies.
Punctuation Can Create Single Words
Some factors determining whether a string is counted as one or more words include:
- a slash separating/joining words
- a non-breaking space—e.g., 25°C
- a hyphen creating a compound word
Punctuation Can Cause Multiple Words
- hyphenated compounds and number ranges (hyphen or en-dash)
- thousands and thousandths separators
- units of measurement
- segmentation of URLs
Not a Word
- symbols such as math operators, ordinals, em-dashes, or ©
- headers and footers
- captions in fields
- comments in edits (tracked changes or suggestions features)
- equations in MathType or equivalent field
The Counts
I ran a very short test document through a word count in the most popular word processing programs. Here is the results summary; there is a longer comparison table at the end:
- Word: 50
- Google Doc: 54
- Txt: 56
- Pages: 54
- LaTeX: 64 (via export)
- PDF: 57 (via export)
Alternatives to Counting Words
For works full of really long words, character counts can give more accurate estimates of manuscript length and how long editing will take. Like word counts, character counts are better than page counts which vary too much to be practical because of the affect of letter and margin sizes (for starters).
Back when counting by hand was the only option, prepositions and conjunctions were routinely skipped. You’d count the words on a few typical lines, then multiply by the number of lines. For columnar layouts, that often meant getting an average per inch of column and multiplying by total column inches.
Consistency is what really matters rather than “correctness”—so pick one way to count and stick to it.
Element | Counted as | ||||||
Word | Google Doc | Txt | Pages | LaTeX* | PDF* | AnyCount | |
COUNT | 50 | 54 | 56 | 54 | 64 | 57 | |
header | 0 | 0 | 0 | – | |||
footer | 0 | 0 | 0 | – | |||
comments | 0 | 0 | 0 | – | |||
— | 1 | 2 | 1 | 1 | |||
time-stamp | 1 | 1 | 2 | 1 | |||
cat&dog | 1 | 2 | 2 | 1 | |||
she’d | 1 | 1 | 1 | 1 | |||
3[nonbreaking]F | 2 | 1 | 2 | 2 | |||
8–9 | 2 | 1 | 2 | 1 | |||
8-9 | 1 | 1 | 2 | 1 | |||
3.14 | 2 | 1 | 1 | ||||
teacher–author | 2 | 1 | 2 | 1 | |||
dash — spaced | 2 | 3 | 2 | 3 | |||
or—not | 2 | 1 | 2 | 1 | |||
slash/unspaced | 1 | 2 | 2 | ||||
slash / spaced | 2 | 2 | 2 | ||||
http://www.url.ca | 1 | 4 | 4 | ||||
—30— | 1 | 1 | 1 | ||||
–#– | 2 | 2 | 0 | ||||
picture | 0 | n/a | n/a | ||||
caption | full, including “figure #” tag | ||||||
symbols © ™ | 1@ | 1@ | 0 | ||||
math + 10 ÷ 8 ½ | 6 | 6 | 4 | ||||
text box | 0 | 0 | yes |
*Content was exported from the PDF and LaTeX documents because those systems don’t have built-in word counters. Since this is something you might have to do, it was worth testing it anyway.
Got a gnarly Word problem? Submit your problem and we’ll try to answer it in the Q&A thread.
Learn with us! Join a course today.
© This blog and all materials in it are copyright Adrienne Montgomerie on the date of publication. All rights reserved. No portion may be stored or distributed without express written permission. Asking is easy!