If you ask an English teacher marking an essay, then counting words is simple. It is the number
that comes up when you hit the “Word Count” button in whichever program you’re writing
with. For English at least, word counting algorithms generally count a word as any string
of ‘letter’ characters. Each string of those, separated by some other character?
That’s a word.
The trouble is, while that might be good enough for checking that an essay is roughly the
length you want it to be, it’s going to count plenty of things that aren’t words
too -- and it doesn’t get to a more fundamental question: what actually is a word?
First of all, there’s the problem of meaning. If you accept the definition that a “word”
is some letters surrounded by a gap, then “xnopyt”, “aaaaaaajjjjjjjjj” and
"hrrkrkrkrwpfrbrbrbrlablblblblblblwhitoo'ap" are all words, despite being pretty much meaningless.
You certainly can’t rely on dictionaries to define what’s a “real word”, either.
There are plenty of words that people use every day that haven’t made it into standard
dictionaries -- okay, maybe that one isn’t used every day any more -- but they clearly
So, okay. Let’s say that a word is some characters, surrounded by something else,
that conveys some meaning. That’s not a bad definition on paper. I mean, literally,
that’s not a bad way to define English words when they’re written down on paper. But
language isn’t just about writing.
That definition, spaces around some letters, doesn’t handle contractions well at all.
In everyday English speech, there are far more contractions than you ever see written
down. Depending on your dialect, “I’d-a” is a perfectly good replacement for “I would
have”, as in “I’d-a thought you’d known that.” Is that three words? Two words?
How about “I’m-a”, as in, “I’m-a let you finish”? There are a dozen different
ways to even spell that one right now, and it’s replacing four Received Pronunciation
words, “I am going to”. Except there’s also a decent argument that “I’m-a”
isn’t even a contraction of those words, it’s a completely separate word in itself,
imma. Either way, good luck counting the number of words you’ve actually got there.
And to make it worse, the meanings of words, and how they’re presented, change over time.
“Today” used to be hyphenated: “to-day”, and before that, it was two words: “to day”.
When did it become one word, instead of two? How about “first class”? Does that become
a single word if you hyphenate it instead? You’ll find that most people will give two
different answers: to-day is one word, but first-class is two.
That’s while we’re still talking about English. How about synthetic languages, ones
where, in extreme cases, you can put an entire sentence’s worth of meaning into one long
string of characters or sounds? Is that still one word?
And more important than any of that: does it matter?
Wordcount is a convenient thing to refer to if you're tracking your progress on a written
project like a dissertation: regardless of how exactly you define a word, it's clearly
a massive feat to write 50,000 of them coherently. But when you zoom in on language as a system,
it makes the most sense to ignore words entirely and talk about morphemes or phrases.
Words might be the most convenient way of dividing up an English sentence, but they
don’t really have much use to linguists. And also, here’s one question to leave you
with, that no-one’s tackled yet: are emoticons, or emoji, words? They’re symbols that convey
meaning, sometimes literal, surrounded by space, sometimes just as a modifier to a sentence.
If I were to end a written version of this with a smiley face: should that count as a word?