Regular Expressions

Posted July 2006

Note: This post has been updated to reflect the contents of /usr/share/dict/words on OS X Mountain Lion, and use egrep rather than Perl.

A friend mentioned that the word "bookkeeping" was (apparently) the only English word that had three consecutive groups of double letters in it. I decided to test this statement, so I wrote a little regular expression to verify it. Here's the program, and the results of running it:

egrep '(.)\1(.)\2(.)\3' /usr/share/dict/words
bookkeeper
bookkeeping
subbookkeeper

So, it turned out he was right. (At the time this post was written, he was; subbookkeeper was added later, and I'll allow bookkeeping as a variant of bookkeeper.) Then I got curious; what if the groups don't have to be consecutive? That's easy enough to check, so let's do that:

egrep '(.)\1.*(.)\2.*(.)\3' /usr/share/dict/words | wc -l
170

Wow. I wonder if any of those have four groups of double letters in them? (Non consecutive?)

egrep '(.)\1.*(.)\2.*(.)\3.*(.)\4' /usr/share/dict/words
killeekillee
possessionlessness
subbookkeeper
successlessness

Those sure are some funny words…