Well.  This is distressing.

I should have caught this word over a year ago, it being absolutely uncommon.  That is to say, the software should have caught it.  We have a little mystery to solve!   “Goblin-imp” is not in the Great Spreadsheet of Doom.  It is definitely not in the Project Gutenberg corpus (I checked).  There is no occurrence of “imp” elsewhere in the text of The Hobbit except as part of “glimpse” or “important” or similar.

Perhaps I can bribe Tech Support with blueberry muffins to chase this further down.

Here it is for now, a tasty treat that Gollum enjoyed a few hours before Chapter 5.

  • 05.087 and caught a small goblin-imp.

Update: this is not a programming error (and I checked the punctuation thing, too).  It seems to be merely a cut-and-paste error such as one has when manipulating over 5,000 lines of spreadsheet.

1951: How do sound words contribute?

Remember the 1951-new paragraphs?  The mountains of uncommon words in bright red above the valleys of pale red?  They are marked by the phrases “Show the nasty little Baggins the way out”, “Curse us and crush us, my precious is lost!”, and “To the back door, that’s it.”


Got them spotted?  OK.  I’m going to take out the pale red 1937 line and put in the 1951 purple sound line.  Ready?

1951.05.UnCo& Sound

Great elephants!  In 1937, the sound words dropped to about a third of the Riddle-Game-climax peak for the remainder of the chapter.  Not so in 1951!

The sound words are not just frequent, they’re punching higher in frequency than they did before, and we see from the shapes of the graphs that the sound words drive the frequency of uncommon words.  These sections – approximately paragraphs [05.080] to [05.132] – are the ones Tolkien added to tie this text forward to The Lord of the Rings as he was writing and discovering that longer, more complex work.

We’ve already listed the words which appeared in the 1951-only paragraphs but not the 1937-only paragraphs and vice versa; that would measure whether the new paragraphs were formed out of the words of the retiring paragraphs.  There is another way to look at those unique-paragraph words, of course – to compare the new paragraphs of 1951 to the entire 1937 chapter.  I’m pleased to report that list is almost the same as the previous.

Almost all of the new words, naturally, occur from paragraph 80 to 132.  Here they are – the completely new uncommon words from 1951.

  • Words we tagged as sound words (tagged by the OED, or from Gollum’s idiolect, or Gollum’s name and characteristic throat noise): cracking creepsy eyeses goblinses Gollum hates gurgling hissing  losst screech shriek smells sniffed squeaked squeaker ssss tricksy
  • other uncommon words (note how many begin with S): back-door  betterment birthday-present blindly blood-curdling bowstring brooded  crawling  crouched  dursn’t flattened  forefinger  galled  gleamed  gnaw  goblin-imp  groping  hiding-place  humped  leapt  maddened  menacing  menacingly  mouse noser nosey oddments  paddling  palely   panted  peered  pricked  quicker shambling sharpened sharper sheathed shiver side-passages snag  sneaking softer splayed squeezes  stab stiffened swayed tripping  tense tunnel-wall  unlost unmarked

footnote: The scales are the same as last post’s graph lines – the red uncommon words line is on the 0.00 to 0.16 scale as shown and the purple sound line is on 0.00 to 0.09.  

1937: How do sound words contribute?

We know that there are plenty of sound words in Chapter 5.  How much do they contribute to the uncommon words of 1937?


That correspondence looks quite strong – and let me give you more grist for the mill.  Lexos draws each individual graph at a scale that visually fills up the space for us to see the patterns clearly, it recalculates what the scale should be for every new graph.  As I reminded us in the last post, in looking at the whole book, Lexos draws the purple sound graph at about 33% of the red uncommon graph: where the two lines match, for every 100 red uncommon words, 33 of them are purple sound words.

When we look only at Chapter 5, Lexos draws the purple sound graph at 56%: where the two lines match, for every 100 red uncommon words, 56 of them are purple sound words.  The strength of the sound words’ contribution to our red graph is almost doubled.

Well, then.  Chapter 5 is full to the brim with sound words!  It’s certainly not unexpected, after all it’s dark in those caverns.  Each sound is magnified, and it’s the strongest sense Bilbo has working for him to perceive his situation.  Here’s a sample of how the sound words and the other uncommon words work together in a paragraph which is identical in both editions:

[05.007]  ‘Go back?  ‘ he thought.  ‘No good at all!  Go sideways?  Impossible!  Go forward?  Only thing to do!  On we go!’ So up he got, and trotted along with his little sword held in front of him and one hand feeling the wall, and his heart all of a patter and a pitter.

We can also notice that the “scrumptiously crunchable” peak is not particularly driven by sound words.  That may or may not be of particular interest, but it does reassure us that the strength of the sound graph elsewhere is not an error, as we can see that it’s not omnipresent.

After the peak at the climax of the riddle game, we observe that the sound words recede to about a third of that peak.

Footnote: I have debated erasing those y-axis scales on the left hand side completely, yet I feel an obligation to make my Lexos graphs comparable to those produced by other scholars.  The scale on the sound words, which I described above as “56%” is from 0.0 to 0.09. If you need to articulate those number sentences more clearly, “for every hundred words, 16 are uncommon, of which 9 are sound words”.  I don’t have the skills to read the OpenSource code which the Lexos programmers wrote, but I used to be a statistician in the days of punch cards and carrier pigeons.  If you ask me questions about the numbers, I can probably ask Tech Support to read the code and tell me the details so I can make a coherent explanation.

1937 & 1951: Do the uncommon words differ?

A few weeks ago, we noticed that the longest sustained high frequency of uncommon words takes place in Chapters 3, 4, and 5 – in Rivendell, captured by goblins, and with Gollum.  Surely that’s not the rhetorical peak of the work – what is happening in these chapters?  We examined those special word categories we’ve been tracking – archaic words, food words, and sound words – and found a huge peak:

2015.06.15 Sound & Uncommon Graph

What was Tolkien doing with all those sound words – clearly the purple Sound Word graph drives the red Uncommon Words graph.  We do remember that the purple line is on a 1/3 scale: when the purple and red lines match, the number of sound words is about 1/3 of the total number of uncommon words at that point.  Although they’re not identical, the coincidence of the peaks at “a small slimy creature” and the similarity of the shapes of those peaks is suggestive.

Earlier I asked “How did Tolkien do that?”  Today I ask… “How did he do it and what did he do?”  Tolkien’s subtle hand with words operates on multiple levels.  As Blackwelder observed:

We may assume that a reader is following the story and the characters and may sometimes fail to notice the unusual words, phrases, or even passages.

We come out into the sunlight at the end of Chapter 5 breathing a huge sigh of relief… when we were safe at home in our comfortable reading chairs the whole time.  How did he use the words, phrases, and passages to effect us emotionally – subliminally?

Well then, let’s take advantage of the writing and publication history of The Hobbit and take a close look at Chapter 5, the chapter which we know he changed in order to change the facts and the feeling of the story.  Here is the graph of uncommon words of Chapter 5 as it was written in 1937.  For this much smaller sample, I used a rolling average on windows of 200 words.


I’ve placed a few textual landmarks – I love that “scrumptiously crunchable” is one peak and that the highest frequency is right at the end of the riddle game as Gollum is waiting for Bilbo’s last question.

You can see an artificial valley right as Gollum cannot find the ring, another one from about word 4250 to word 4900, and the largest and last one after Bilbo puts on the ring which stretches until he and Gollum part ways.  I call these “artificial valleys” because at these points the 1951 Chapter 5 has different paragraphs and I inserted the word “and” in each spot enough times to match those 1951 paragraphs’ word count without making a false image of uncommon words.

We’ve looked at which words are new in the 1951 edition (and which were lost from the 1937); are those words evenly distributed through the chapter?  Keep your eyes on those artificial valleys as I show you the 1951 graph overlaid on this 1937.


Look at those valleys!  Over all three 1937 valleys are towering 1951 mountains of uncommon words!  When he wrote those extra paragraphs, Tolkien pulled out the stops.  “Pocketses” and “Curse us and crush us!”  In the “curse us and crush us” spot in 1937, Gollum repeats “bless us and splash us”!  What effect did Tolkien accomplish and how did he do it?  I’m going there in the next blog post.

You can even see the small artificial valleys in the bright red line where I inserted “and” in the 1951 to make up the word count from the extra 1937 paragraphs.  Those 1937 paragraphs which were removed were definitely not peak word moments, they toddled along in a manner that looks pretty average for the rest of the chapter.  Notice that the graphs remain the same shape but become disjoined during the riddle game?  That follows a few spots where 1951 adds just a few words in just a few sentences, pushing that bright red 1951 line slightly rightward.


And so we draw to a close.  Bilbo learns later what leads to the little trough at 92,000.  (Sidenote: I am confused why the Lexos graph ends just before word 93,000 as the last dot is labeled, even though the x-axis label seems to go to word 100,000.  We know from counting that the text file that Lexos read has 96,157 words.  I’ll research and report back, Word Fans.)


[18.024] …But weariness left [the goblins’] enemies with the coming of new hope, and they pursued them closely, and prevented most of them from escaping where they could.

After this, a small rise through leave-takings and the safe, healing, restorative journey home.  Our tale ends where it began, in The Shire.  At the very end of this graph, the proportion of uncommon words is 0.044.  We’ve been here before, of course.  For those of you trying to draw a level line with your eyes, the trough of Chapter 1 measured in at 0.042.

[01.096] ‘Pardon me,’ he said, ‘if I have overheard words that you were saying.

The War Words

This is eerie.

2015.06.14 War Words

I can see the dragon-sickness and the evil it wreaks in this graph from word 75,000 to 84,000.  The exact trough, almost as low as our nadir of helplessness in Mirkwood, and perhaps even more hopeless, is here:

 [15.032] As they stood pointing and speaking to one another Thorin hailed them: ‘Who are you,’ he called in a very loud voice, ‘that come as if in war to the gates of Thorin son of Thrain, King under the Mountain, and what do you desire?’

And what restores hope?  You can see it, too.  Encoded in the frequency of uncommon words.  And what are the lines at word 86,644?

[16.042] … As they passed through the camp an old man, wrapped in a dark cloak, rose from a tent door where he was sitting and came towards them.

[16.043] ‘Well done! Mr. Baggins!’ he said, clapping Bilbo on the back. ‘There is always more about you than anyone expects!’ It was Gandalf.

The line holds fairly steady throughout the face off, the parleys, the tensions, the goblins and wargs and Dain and the bloodshed right up to the objectively measured, carefully calculated turning point in use of uncommon words.

[17.066] ‘The Eagles!’ cried Bilbo once more, but at that moment a stone hurtling from above smote heavily on his helm, and he fell with a crash and knew no more.

Climbing the Lonely Mountain

The low point near 62,000 surprised me.

[10.029] ‘I am Thorin son of Thrain son of Thror King under the Mountain! I return!’

Exactly there.  I had predicted that paragraph would be a high point of uncommon words since in my heart it is a high point of drama.  Instead, it is a turning point.  The words surrounding it have to do with being soggy and smelling of apples, so I understand the low point.  Remembering that our analysis ignores names, we can see that this phrase uses words so important that they are in common use.  Remembering that it’s a five thousand word window, I’m mightily impressed that it is the exact turning point.

2015.06.14 Lonely Mountain

We climb out of the barrels, climb through Laketown, climb the mountain, solve the thrush mystery, climb into the mountain, riddle with Smaug.  Our average of  uncommon words rises with a few local variations (the local peak is [12.021] Thieves! Fire! Murder! Such a thing had not happened since first he came to the Mountain! His rage passes description) to that double-peak surrounding 74,000 words:

[12.093] …Smaug will be coming out at any minute now, …
[13.019] It was the Arkenstone, the Heart of the Mountain.

Is your heart racing?  My heart is racing.