Scientific Research

Explaining the news: is Vox top?

There are thousands of news sites out there. But what if there was a way to find out which site is best for giving you a good overview of news stories.

I’ve analysed newspapers before, but this is different. At¬†the recent News Impact¬†Summit (NIS), I heard an interesting talk by¬†the¬†engagement manager for Vox, a newish online news site. But a couple of things¬†seemed off with what we were told.

Vox’s spokeswoman told us that her site’s¬†goal is to set itself apart from other news sites by explaining the news, and to do so in a shareable way. She showed examples of ‘cards’ (Vox’s way of displaying¬†content explaining the news), but¬†I couldn’t help but have a problem with their examples of ‘easy-to-understand’ content. With only 6 sentences shown on screen, at least 2 of them were longer than¬†30 words*.

At GOV.UK, where¬†I’m currently freelancing, we wouldn’t have that. No not at all, for user research shows that anything over 25 words is a reading killer. Similarly, we’re told to avoid unnecessary or uncommon words, such as the “hence” that started sentences in the Vox examples.

On the other hand, Vox’s spokeswoman told us¬†it¬†put tremendous effort into polishing headlines to make more readers want to click.

Me being me, this prompted me to wonder what the truth is.

Man reading a website as a paper

Skip a section

If you want to skip the method and background you can go to:

Comparing the news

I selected¬†several of the top news stories of the past year, ones¬†that Vox had a ‘card’ for:

  • the ebola outbreak
  • Islamic State
  • Malaysian Airways MH17 downing over Ukraine
  • the Ukraine crisis
  • Michael Brown shooting and rioting¬†in Ferguson, USA

I¬†wanted to look at comparable news sources. This doesn’t just mean news sites. I looked¬†at a combination of the most popular news sites in the world¬†(that¬†I could access without subscription), along¬†with¬†other ways we get news. So even though BuzzFeed and Reddit aren’t in the top 10 news sites, they are¬†significant news source for many. I then divided these into new and old media.

‘Old media’ (organisations established before the internet):

  • BBC News — the UK’s most popular news site. Most of its articles are written by its own journalists
  • The Daily Mail — the world’s most popular news site and, unlike the New York Times, I can access its articles. It uses Associated Press articles along with its own
  • The Guardian — another globally popular website but one that aims to be a bit more highbrow than the Mail. Has many guest authors
  • The Economist — though not as popular, like Vox it seeks to explain the news and not just report it. No author bylines, all articles conform to one style

‘New media’ (organisations set up since the internet became popular):

  • Huffington Post — like Vox, this is a ‘new media’ site and very¬†popular. It too has a range of guest authors
  • BuzzFeed — journalists love to spoof its hyperbolic headlines, but it’s increasingly popular, particularly on Facebook, and its UK¬†editor¬†was interesting at the NIS
  • Reddit — a social site with a range of topics. I looked at¬†its ‘Explain like I’m 5’ sub-reddit (thread) for ‘simplified and layman-accessible explanations’
  • Vox — US news site that features both news stories and more in-depth explanations through its ‘cards’

Not every site had a good summary or explainer, while some had more than 1. You can see the full list of articles here.

Using various analytic tools, including readability analysis programs, word counts, my own splitting, and the LIWC word analysis tool, I ran the articles through several analyses.

What I expected to find and why

Vox says it spends a lot of effort perfecting the headline. Good, for¬†I found in previous research that a good headline — descriptive, inviting, optimised — is vital for getting readers to click.

However, nothing was mentioned of polishing Vox’s content. To be fair to the speaker, she wasn’t¬†a writer, so she may not have had the information. Yet this meant my expectation was that the headlines would be polished but the content could ramble (and not be readable).

As for the other sources… The Guardian is a ‘high brow’ paper so would probably be the least readable of the major sources. The Economist is also high brow but it takes the view that authors should never assume too much prior knowledge of its readers. As a subscriber I listen to its audio edition and the language flows. Like the BBC then, as a media firm that has a ‘spoken word¬†service’ (so to speak) this helps focus on good readability.

The Daily Mail, however, is so popular that it must appeal to the lowest common denominator — easy reading. The Huffington Post was my main uncertainty — I don’t read it, and going by my social networks, no one else seems to (at least in the UK). But a quick looks shows that it has a lot of authors and no set tone.

Finally there are two of the newest sites¬†— Reddit and BuzzFeed. BuzzFeed is a joke to many journalists (sorry BuzzFeed staff reading this). But at the NIS, the site more (in)famous for headlines like “Can You Make It Through This Post Without Feeling Sexually Attracted to Food? ” and “Emoji Facts That Will Make You ūüôā” and its ilk seemed to be getting the last laugh. Its UK news editor got the respect, if grudging, from the more senior hacks there.

In part it’s because BuzzFeed¬†is going beyond cat pictures to do more serious reports. Readers are coming for the memes but¬†staying for the news.

Reddit is slightly different to all the others on this source. It’s a glorified messageboard — anyone can ask a question, anyone can answer. Other users can vote on questions and answers, and as it’s so popular it has a wide range of users, from experts to the average internet commenter. Thanks to the voting of the ‘best’ queries and answers I’ve often found good, clear explanations that go beyond the news article it’s linked to. In particular, the Explain Like I’m 5 sub-Reddit (thread)¬†is dedicated to explaining complex issues (not just the news) and ideas in simple ways.


Data processing


Headline complexity

A good headline will give enough detail to describe, but leave enough out to make the reader want to find out more. Today’s readers are presented with so many headlines on a news site’s homepage, let alone their social and other sites, that it’s vital that headlines stand out. One way of doing that is making sure they actually understand (or have a good guess) of what¬†the headline will link to.

I couldn’t measure whether something was clickbait (ie, content doesn’t¬†match the title), and¬†I find headlines are too short to run a readability analysis. Instead I found complexity of words as a good proxy, where a ‘complex word’ is any with¬†3 or more syllables. In other words, long words.

Though not perfect, it does give us an idea of how snappy a headline is.¬†I didn’t look at length because in this day of search engine optimisation, and on my previous research, I didn’t find a good correlation between clicks and length.

A good headline then should be long enough to capture the story and capture the reader —¬†no more, no less.

Most complex is the Guardian, followed by BuzzFeed (well it does like words like ‘unbelievable’ and ‘amazing’).¬†Vox, by contrast, has fairly snappy headlines (“11 things you need to know about Ebola”), as do the¬†other new media sites, Reddit and the Huffington Post.

Headline categorisation

While it’s hard to gauge¬†content, the LIWC can give some¬†idea of what the headline is about based on word categories.

Vox says it’s there to explain the news and it does have¬†a¬†high proportion¬†of insight words (“think”, “know”). The Guardian, by contrast, has more causation words (“because”). Now there’s a subtle difference between causation and insight. My view¬†is that words classed as “insight” are more fact-based (“this is what happened”) whereas insight is more about opinion (“this is¬†why this thing happened”). Both give you an overview, but causation suggests that it’s opinion-led.

This is a subtle distinction but if this is true suggests that the Guardian (and Huffington Post) are likely to have the more opinionated authors, those who (claim to) know the answer. By contrast, Vox, like the BBC, is more neutral, focusing on the facts.

For touchy feely types, Reddit is about the senses (“We’ve been hearing about ebola…”). Of course the main difference of Reddit with the others is that the question (or headline) in this case is posed by one user and answered by others. This will result in varying questioning styles and answers.

Body copy

Let’s go from the headlines now into the meat of the content. So far Vox seems to be doing what it stated — explaining in a fairly neutral way what’s happening, with fairly polished headlines.


There are different ways to score how easy it is to read an article. These are based on looking at sentence length, complexity (number of syllables) and other factors.

Averaging the outputs I came up with a score, where, like golf, the higher the number the ‘worse’ it is.

As with headlines, the Guardian insists on being complex. Yet Vox isn’t that far off, being the next most complex, in line with my expectations based on those long sentences and non-plain words.

By contrast the BBC is a lot less complex. I did include one article aimed at children on CBBC, but this had a similar readability score to¬†the main BBC News article. The Daily Mail also keeps its writing less complex. Like the BBC it has a broad readership and as such can’t afford to be too complex.

Let’s dig a bit deeper and look at other reasons why the Guardian and others are so complex.

Sentence length distribution

I looked at sentence length partly because of this quote on the GOV.UK blog:

Writing guru Ann Wylie describes research showing that when average sentence length is 14 words, readers understand more than 90% of what they’re reading. At 43 words, comprehension drops to less than 10%.

Cumulative here just means that I keep adding the total in one category to the next.¬†So BuzzFeed has 24% of its sentences in ‘9 and¬†fewer words’, and 51% (24%+27% for ’10-14 words’) fewer than 14 words. The Guardian by contrast (yet again) only has 12% of its sentence as short as 9 words.

Looking at the curves you can see that BuzzFeed has short, punchy sentences and so its curve is steep and peaks early. The Guardian, with long, word sentences, gently curves out as it rambles on. Vox is between the two. That can be a good middle path. Short sentences aren’t always best.¬†They¬†can be distracting.

This method isn’t perfect but with enough data it does give a good indicator — BuzzFeed’s sentences are likely to be understood by more people than the Guardian’s. And Vox’s.

Long sentence split

There’s¬†another way of looking at ¬†sentence length — what’s the overall split between complex and short sentences?

BuzzFeed really stands out for its snappiness,¬†while a 1/3 of the Guardian’s sentences are classed as long. Ouch.

Yet despite having a good readability score, the Daily Mail has sentence length proportions approaching the Guardian’s. We need to find out more.

Adverb use

I believe the road to hell is paved with adverbs, and I will shout it from the rooftops.

Stephen King¬†is just one of many authors and style-guide setters who rail against the adverb, seeing it as a sign of poor writing. Adverbs modify verbs, such as “he quickly walked”. A good writer¬†would generally (and this is a generalism, as¬†there is debate) use a single word than add an adverb. For example, rather than “quickly walked”, they’d use¬†“darted”, “dashed” and so on (as long as the single word is still plain English).

As such I use adverb count as a rough measure of how good the writing is. It can also be seen as how good the sub-editing process is (if any, sad to say), balanced against the need to let an author’s voice be heard.

Reddit has the highest use of adverbs. Not surprising — users aren’t professional writers nor do they have a sub-editor. I’d be surprised if the authors themselves even spent time editing their work. And that’s to be expected, as Reddit is ultimately a messageboard, not a professional publication.

I was surprised at the amount of adverbs in the Huffington Post and the Guardian.¬†Having had the chance to ask a former Guardian sub I was told that the paper, while keen to maintain its style, doesn’t want to mask the author’s voice. With many authors not professional writers (and, being news, they have a short time to compose their material) it’s no wonder that adverbs are allowed.

The BBC, by contrast, is in no¬†rush to break¬†news, nor¬†does not have many guest columnists but instead has professional journalists write most of its content. The Economist is a weekly newspaper so has that increasingly rare¬†luxury of time — time to let¬†writers review and subs to sub.¬†It also aims to have a single, consistent style and voice.

This doesn’t explain the Daily Mail, which sits there, in the middle. But of the 3 articles analysed, 2 were by the Associated Press, which tends to go for a neutral style (unlike the Mail).

Subjects and pronoun use

Finally let’s look at who the authors address and how much of this is personal experience.

Now I’ve not accounted for quotations in this, which by their nature are personal experiences and need attributing (he says).

As before, Reddit as the more social of the news sources lead the way with the personal “I”. And¬†with the question being set by another user, it’s natural to respond to them with “you”. I was surprised that the Huffington Post had a similar proportion, but I wasn’t surprised that the traditional news sources lack the first person.

What can this tell us? GOV.UK tells its writers to address its subjects as “you”, though I couldn’t find the research to say why this is best. As a writer it does feel more personal using “you” but can’t¬†¬†say why it’s better to the reader. My research on this at Which?, where I had Google Analytics and Omniture¬†data, didn’t lead to any conclusions about user behaviour and the best form of addressing readers.

Instead it’s more as interest to see how the split goes between different organisations, and the divide between the old and new media.

Passive voice

Style guides warn against the use of passive voice and encourage the active voice (ie, “Freddie Starr ate my hamster”, not “A hamster was eaten by Freddie Starr”).

The BBC, the bastion of impartial and neutral news, and so is the most passive (“it was claimed”). A noble idea, but not always as readable. Vox at the other end is the most direct (“Russia denies it is invading”) along with the Mail and Guardian. BuzzFeed doesn’t do as well here (“Authorities in these nations have scrambled to contain the disease”), but its short sentences seem to carry its overall readability.


Looking at how easy it is to understand a headline, the new media (Vox, Reddit and Huffington Post) win the day. Their headlines were the most polished and appealing to readers, and¬†state clearly that they’ll explain the news.

The new media sites, with the exception of BuzzFeed (“11 Things You Need To Know About The Ebola Epidemic That‚Äôs Killing Thousands”), had less complex headlines. Not to say that this meant short headlines — search words have to be crammed it — but shorter words were generally used.

The best overall readability was for the BBC, but in terms of sentence lengths BuzzFeed kept it short and punchy throughout. The Guardian however had long headlines and long sentences, hurting its chances of being widely understood by a wide demographic.

Other observations

Several sites had topic pages, eg the Huffington Post’s MH17 topic, while few had summaries like Vox’s cards or¬†the BBC’s explainers . Topics¬†are pages that collect all¬†pages related to a news story¬†could be found. Yet when I tried to use them to find the ‘best’ page or a summary it was a barren search. Instead it seemed more as a technical solution (grouping similar content) to a technical problem, but not an editorial answer. I preferred the Vox style of an editorial collection summing up the situation.

I ignored images, which the Daily Mail and BuzzFeed¬†have a large number of. I don’t know how this may affect readability. When it comes to online content I’m with Alice (of Wonderland fame), who tired¬†of¬†writing that lacks pictures. I don’t know what effect this has on readership, though I know that images benefit search engine optimisation.

Finally, I didn’t look at overall word length as this would be unfair on Vox. Though this is a good indicator of readability, the way Vox arranged its content meant its multiple pages would count as one according to the analysis programs.


Breaking news

Does all this matter? News sources cater to different audiences so if the Guardian wants a reader base that has to put in a bit of effort¬†to understand what it’s trying to say, then that’s the Guardian’s¬†choice.¬†Me, I prefer to keep things plain.

I also wonder¬†whether complex readability hurts the Guardian’s influence — if readers aren’t clear what’s being said then how can the paper have a great influence? How many people enjoy¬†struggling through an article? If there’s a good point to be made, let alone a tricky question to answer, why make it hard to understand.

I have no beef with Vox. It’s interesting what they’re doing and I single them out because they presented a statement to a room of journalists and it’s a journalist’s job to challenge. But compared with newspapers that have already been explaining the news for years, such as the Economist, it has much to learn. It wasn’t surprising then to hear that Vox¬†was set up by bloggers. Blogging is a different beast to journalism, though as shown by Vox’s rapid rise, it has benefits for grabbing online readers.

So in answer to the question in the headline — is Vox top? The answer follows Betteridge’s law — no. Vox¬†has good¬†headlines but its content is so dense that it is unlikely to attract the broad demographic it apparently¬†aims¬†for.

Instead I see BuzzFeed continuing its success due to¬†its¬†easy-to-read sentences¬†(and so be readable by the widest audience).¬†Yet in contrast to its copy, BuzzFeed’s headlines were long, though at least they described the article.

Yet a quick revisit to Vox showed a different story. While headlines to the explanatory cards in Vox were well written, the news headlines¬†caused a bit of headache when we looked at them. “Europe’s leaders have succeeded in making Greece unimportant” had¬†to be¬†read a couple of times to get the meaning. I wasn’t even sure what I’d get when I clicked on that headline.

Is there a best site, as stated at the start of this article? Horses for courses, but to avoid weasling out, I’d say that the BBC seems to strike the best balance between them, while at the more sensational end BuzzFeed is best. Reddit can be good, but I’d prefer to monitor its news summing up before giving a better answer.

Next time

If I did this again I’d¬†also want to look at:

  • passive voice proportions through a new tool — I don’t like the passive voice analysis¬†in here so would want a second opinion
  • verb phrases¬†per sentence, apparently a better predictor of readability¬†— this would mean building a new analysis tool
  • more data — bigger is better, but I couldn’t/didn’t scrape this time as it would have taken longer than doing it manually


I don’t have the traffic data for any of the sites I analysed. Reddit is probably the closest as it gives a score. Of course if anyone working at those sites wants to send me any data I’d gratefully receive it…

Even¬†with¬†this lack of data, I’d still expect:

  • BBC — slower off the mark with news stories as it spends longer polishing them, so it’s worst for breaking news, but it’s the¬†easiest to comprehend news source. Will continue to be a go-to news site of choice, but its CBBC news for children needs to be simplified. If traffic is good its ‘explainers’ may become more popular
  • The Daily Mail — with only one article written by the Mail it’s hard to give a unique distinction for it, but those selected¬†were easy enough to read. Will remain a global news souce
  • The Guardian — plodding headlines and plodding pieces mean that if articles are read, I’m not sure how much will truly be retained and understood. I wonder how many readers skip straight to the comments. While those who understand it seem to love it, its high reading comprehension means its demographics will be much narrower most of the other news sources in this study
  • The Economist — in many ways what Vox is aiming for, each article assumes no prior knowledge and it’ll remain my go-to newspaper for news summaries. If only its headlines were a little more descriptive and it sentences a bit more active, it may become more popular than it is
  • Huffington Post — it’ll¬†continue to be stuck in the middle ground, neither new or old media,¬†it’s both too impersonal and too distant so¬†occupies this¬†niche. A niche¬†that’s not enticing to me
  • BuzzFeed — I expect a reasonable click-through rate for its headlines but¬†as its articles are easy to read users are likely to share them and to read more of them. Expect the news site to grow in popularity. I’m guessing its complex headlines serve its purposes, and I’d be interested to see what testing they’ve done on them
  • Reddit — it has users who address other users, don’t expect a polished (or any response) but can give an easy to digest understanding of the situation (if the article exists). Surprising amount of experts on there, from Arnold Schwarzenegger to research scientists. Its future depends on its readers (which¬†reminds me of something)
  • Vox — it will draw readers in with a good click-through for headlines but will have a high¬†bounce (‘exit) rate and low click-through for the next page due to its hard-to-read format
  • topic pages on news sites — unless top/relevant/best posts are pinned to the top these will mainly serve as useful pages for the authors but too garbled to use for the average reader

*I blame GOV.UK for being able to spot a complex sentence and counting it in a Rain Man-esque manner