fun fact: Nabokov was also a lepidopterist

Famous novelists write in their own particular style. Ernest Hemingway used short sentences; Henry James used much longer ones; Virginia Woolf let them flow free-form from her mind. But say you’re handed a particular piece of text. Could you figure out the author just by examining the words that appear?

That question, and other similar investigations, are the subject of Ben Blatt’s Nabokov’s Favorite Word is Mauve. It begins with a historical conundrum: who really wrote the Federalist Papers? A handful of them were in doubt as their authorship had been claimed by both Madison and Hamilton. For nearly two centuries, historians argued about who the real author was, teasing out evidence by the political slant of each essay. But the question was finally settled in 1963 by Mosteller and Wallace, a pair of statisticians. They approached the problem systematically: (a) count the frequency of common words in works known to be written by each man; (b) count the frequency of those same words in the disputed essays; (c) compare. In the end, it mostly came down to the use of whilst versus while–Madison used the former and Hamilton the latter. That, along with an array of other common words, confirmed Madison as the author of the unsettled manuscripts.

Blatt’s book is a compendium of answers to similar questions via statistical means. The first chapter tackles the cardinal rule of efficient writing: use adverbs sparingly. To analyze this idea, Blatt takes a corpus of novels written by a broad cross-section of famous writers (details of the data set are in the book, but it is expansive and comprehensive) and does frequency counts on the –ly adverbs. The results are what you might expect (data from p. 13 of the book):

No surprise that Papa is the most efficient writer by this metric. But this is just the beginning of the fun in addressing this question. Is each author uniformly efficient, or does it vary from book to book? Is Hemingway really the most efficient or just the most efficient on average? It turns out that William Faulkner wrote three books with a lower adverb rate (As I Lay Dying (31), The Sound and the Fury (42), The Unvanquished (46)) than Hemingway’s lowest count (To Have and Have Not (52)).

Or how about the assertion in the book’s title that Vladimir Nabokov’s favorite word was mauve? What does that even mean? Clearly it is not a common word, so we shouldn’t expect it to be high on the list of word frequencies in Nabokov’s novels. This investigation begins with a question once posed to Ray Bradbury: what is your favorite word? Bradbury’s answer was cinnamon, and it turns out that he does use it more often than you might expect. The way Blatt quantifies this is to take the Corpus of Historical American English, a 385 million word sample of works from 1810 to 2009, count the frequency of a particular word, and use this as the benchmark against which to judge just how much a favorite a particular author’s word is. It turns out that Bradbury uses cinnamon about 4.5 times more often than it is used in the Corpus, but he used spearmint 50 times more often.

Blatt then sets up the following requirements to judge these types of words: (1) it must be used in at least half the author’s books; (2) it must be used at a rate of at least once per 100,000 words; (3) it must be obscure, in the sense that it has a frequency of less than once per million in the Corpus; (4) it is not a proper noun. Based on these criteria, Bradbury’s actual favorite words are icebox, dammit, exhaled (sorry, cinnamon). What about Nabokov? He did have synesthesia and he used color words more often than other writers. His favorite word under this metric is indeed mauve.

As a mathematician and as someone who really likes to read, I enjoyed this book quite a bit. The insights into what makes for good writing and for what distinguishes the greats from the merely goods are terrific. I recommend it heartily.


