Tracking the corpus: sporks, eccentric millionares, and herding cats

An old On Language column on the Oxford English Corpus came across my desk today. This is a huge compilation of text from novels, newspaper and magazine articles, blogs and chatrooms, and even spoken transcripts in an ultra-powered searchable database. The corpus allows you to see words in context; for example, as the column says, things described as pink are much more likely to be also described as fluffy than fuzzy. (Also, cats are the third most-often herded animal, at least in language, eccentric is used to describe old rich men surprisingly often, and, as anyone under the age of 25 can tell you, sporks are often used as humorous weapons.)
Most of this stuff seems fairly instinctual, but the OED people hate going by “feel.” So they’re using the corpus to scientifically create more accurate dictionary entries. For example, the 1999 edition of the Concise OED had only one definition for edgy: “tense, nervous, or irritable.” But a search of the corpus proves what most everyone already “knows:” that another definition is “avant-garde and unconventional.”

There’s some good reading on the AskOxford site for those who are interested. I’d stay away from the demos, which are incredibly boring unless you’re interested in playing with the corpus itself. Which you can’t quite do: the Oxford English Corpus doesn’t seem to be available to the public, but the company that made it offers a 30-day trial of a similar product. And judging from the demos, the software is not terribly user-friendly–but it can do all sorts of things not hinted at on AskOxford. You can look up snowclones terribly easily, for instance. I’m sure it can do much more than that, but I don’t understand how to use it. (Yet.)

This offers so much potential for fun, mostly time-wasting experiences, that I’m not sure I’ll sign up. (Oh, but how can I resist?)

Leave a Comment

Your email address will not be published. Required fields are marked *