Broofa’s Readability Analyzer


Okay, this one goes out to all you hard-core geeks out there! I’d like to introduce Broofa’s Readability Analyzer.

For the Impatient

What is it?
A tool for determining web page readability.
What’s it do?
At the bottom of every web page, you’ll see how “readable” the page is, based on a combination of computed scores.
How do I use it?
Make sure you are using the Firefox browser w/ the Greasemonkey extension. Then install the Readability Analyzer. (either right-click that link, or navigate to the script and click the “Install” button that GreaseMonkey should show you.)

Background
There are several methods for rating the “Readability” [wikipedia.org] of a document. All of these work in more or less the same way: Count the number of syllables, words, and sentences in a representitive sample of text, plug them into a formula, and out comes a result that is (usually) the grade level required for a reader to understand the text.

Readability calculations are not definitive by any means. But they are useful, especially for writers. They provide a way to gauge the complexity of one’s writing and adjust it accordingly. For example, web documents should usually be targeted at the 6th-9th grade level, by using slightly shorter sentences and a less sophisticated vocabulary. Technical documentation should of course be slightly higher-brow (longer sentences, bigger words).

Readability Systems
One thing that is abundantly clear with the Readability Analyzer is that the various systems referenced under the Wikipedia entry above are not consistent. Different systems can be up to 4 or 5 years different in reading level they quote. Thus, the overall readability grade it reports is actually an average of all five (count ’em, five!) algorithms.

Also, the “easy/difficult” rating it reports assumes your “average” reader has the skills of an 8th or 9th grader.

Understanding Web Documents
One of the challenges in rating web page readability is in determining what qualifies as “readable” content. Web pages contain a lot of extraneous information (navigation bars, table of contents, etc.) In theory, one could just find all the paragraph blocks and pull the text out of those. Unfortunately page authors often use different elements to hold text.

To work around this, the analyzer looks for common text-containing elements (DIV, DT, DL, LI, PRE, etc …) and uses those. It also tries to be smart about throwing away paragraphs that appear anomalous in length (too short) or content (too many non-word characters or too much markup). The result is that it does a “reasonable” job of pulling the most interesting text out of a page.

Counting Syllables
Several of the algorithms used rely on the notion of “complex words” – words of 3 or more syllables (this is reported in the Readability Analyzer as “hardwords”). Others simply use the total syllable count. Either way, this presents a problem for any automated tool because English is a very bizarre language. There are lots of exceptions to every rule, especially where pronunciation is concerned. To avoid an elaborate and time-consuming syllable counting implementation I instead “borrowed” the fairly simple algorithm used by the WordPress Statistics plugin. It merely counts consonent-vowel pairings, with a couple extra rules for common-sense exceptions. After adding my own fudge-factor to the mix, the result is a syllable count that is not exact, but probably “good enough” for most purposes.

In Conclusion …
Automated calculation of readability is an approximation at best. And the Readability Analyzer definitely reflects that. The rating it provides should be treated as a “guesstimate”. It doesn’t tell you the quality of the writing, and it certainly doesn’t tell you the quality of the mind behind the writing, but it does give you a feel for who the target audience might be.

For this post: “Readablity is fairly difficult (~ grade 11) ” 🙂

,

One response to “Broofa’s Readability Analyzer”