Categories
Research Scientific Research

“Success with Style” part 4 — modern data and just a chapter

When starting this analysis I spotted that the download data was for the past 30 days and that this was used for success or fail categorisation. 

Even if the data was for the lifetime of the book, it’s been nearly 5 years since the original downloads. The best way to test this then was to get the latest data (albeit still for the past 30 days).

The other thought was that the analyses looked at the entire book. But what if readers did not read the entire book but only read a certain amount before making a judgment? When submitting work to an agent or publisher for consideration, for example, often only the first chapter is requested. Based on this I analysed just the first 3,000 words of each book through the Penn and LIWC tagger and used its 2013 success/fail data to repeat the experiments.

Finally I noticed a bias towards punctuation as markers for success or failure in the output and ran the experiments without the punctuation tags to see what the result would be.

Starting hypotheses

H0: There's no difference in the tests which produce significant results between the 2014 and 2018 data
HA: There is a difference in the tests which produce significant results between the 2014 and 2018 data

H0: There's no difference in the tests which produce significant results between the full machine analysis of the book and that of just the first 3,000 words
HB: There is a difference in the tests which produce significant results between the full machine analysis of the book and that of just the first 3,000 words

The hypotheses are fairly simple – if there is no difference in the 2018 data then most of the test that proved significant with the 2013 data should also do so in 2018.

Likewise if the first 3,000 words is unimportant the test results should likewise only be significant at the same level.

3,000 words (3k words) is about 10 pages and is about one chapter’s length although of course there is no hard and fast rule about how long a chapter is.

Data used

Data summary

2018 data download date

2018-07-22

2013 data download date

2013-10-23

Unique books used

759

Difference in 2013 and 2018 success rates

Row Labels Count
FAILURE 22
Adventure 5
Detective/mystery 3
Fiction 2
Historical-fiction 1
Love-story 1
Poetry 8
Short-stories 2
SUCCESS 20
Adventure 3
Detective/mystery 4
Fiction 1
Historical-fiction 4
Love-story 3
Sci-fi 5
Grand Total 42

There were 758 unique books (the remaining 42 of the 800 listed were in multiple categories). With 42 differing that is 5.5% of the total books used and none of those with a different success status was listed in multiple categories.

The new data was parsed through both the Perl Lingua Tagger using the Penn treebank and Perl readability measure and the LIWC tagger.

Results for 2013, 2018 and 3,000 word data

Machine learning performance

The most important measure for me is which is the best for making predictions. 

Using all tags including punctuation

Accuracy

95% Confidence Interval

Sensitivity

Specificity

Readablity 2013

65.62%

57.7-72.9%

69%

63%

Readablity 2018

65.00%

57.5-72.8%

68%

63%

Readablity 3k

55.62%

47.6-63.5%

68%

44%

LIWC 2013

75.00%

67.6%-81.5%

76%

74%

LIWC 2018

71.70%

64.0-78.6%

78%

66%

LIWC 3k

56.25%

48.2-64.0%

53%

60%

According to this the LIWC is still the best tagger and that both 2013 and 2018 data are fairly similar for both readability and LIWC, with the results being in each other’s 95% confidence interval.

Both for readability and LIWC the first 3,000 words (3k) are much worse predictors of overall success and barely better than a 50/50 guess.

Difference in significance in key measures

Punctuation

Overall there was not much difference in omitting punctuation for LIWC or Penn analyses. In fact the machine analysis performances all dropped around 5% points. 

Readability 

Genre

Significant 2013

Significant 2018

Significant 3k words

Adventure

TRUE

TRUE

TRUE

Detective/mystery

TRUE

TRUE

TRUE

Fiction

FALSE

FALSE

FALSE

Historical-fiction

FALSE

FALSE

FALSE

Love-story

TRUE

TRUE

TRUE

Poetry

FALSE

FALSE

FALSE

Sci-fi

FALSE

FALSE

FALSE

Short-stories

FALSE

FALSE

FALSE

Significant tags in the same genres for all 3 different categories.

LIWC categories

Test

genre

Significant 2013

Significant 2018

Significant 3k words

Clout

Adventure

TRUE

FALSE

TRUE

 

Detective-mystery

TRUE

TRUE

FALSE

 

Fiction

TRUE

TRUE

FALSE

 

Historical-fiction

FALSE

FALSE

FALSE

 

Love-story

FALSE

FALSE

FALSE

 

Poetry

FALSE

FALSE

FALSE

 

Sci-fi

FALSE

FALSE

FALSE

 

Short-stories

FALSE

FALSE

FALSE

         

Authenticity

Adventure

FALSE

FALSE

FALSE

 

Detective-mystery

FALSE

FALSE

FALSE

 

Fiction

TRUE

TRUE

FALSE

 

Historical-fiction

FALSE

FALSE

TRUE

 

Love-story

FALSE

FALSE

FALSE

 

Poetry

TRUE

TRUE

FALSE

 

Sci-fi

FALSE

FALSE

FALSE

 

Short-stories

FALSE

FALSE

FALSE

         

Analytical

Adventure

FALSE

FALSE

FALSE

 

Detective-mystery

FALSE

FALSE

FALSE

 

Fiction

TRUE

TRUE

TRUE

 

Historical-fiction

FALSE

FALSE

FALSE

 

Love-story

FALSE

FALSE

TRUE

 

Poetry

FALSE

FALSE

FALSE

 

Sci-fi

FALSE

FALSE

FALSE

 

Short-stories

FALSE

FALSE

FALSE

         

6 letter words

Adventure

TRUE

TRUE

TRUE

 

Detective-mystery

FALSE

FALSE

FALSE

 

Fiction

FALSE

FALSE

FALSE

 

Historical-fiction

FALSE

FALSE

FALSE

 

Love-story

TRUE

TRUE

TRUE

 

Poetry

FALSE

FALSE

FALSE

 

Sci-fi

FALSE

FALSE

FALSE

 

Short-stories

FALSE

FALSE

FALSE

         

Dictionary words

Adventure

FALSE

FALSE

FALSE

 

Detective-mystery

FALSE

TRUE

TRUE

 

Fiction

TRUE

TRUE

FALSE

 

Historical-fiction

FALSE

FALSE

TRUE

 

Love-story

FALSE

FALSE

TRUE

 

Poetry

FALSE

FALSE

FALSE

 

Sci-fi

TRUE

TRUE

TRUE

 

Short-stories

FALSE

FALSE

FALSE

         

Tone

Adventure

FALSE

FALSE

FALSE

 

Detective-mystery

TRUE

TRUE

TRUE

 

Fiction

TRUE

TRUE

TRUE

 

Historical-fiction

FALSE

FALSE

FALSE

 

Love-story

TRUE

TRUE

FALSE

 

Poetry

TRUE

TRUE

TRUE

 

Sci-fi

FALSE

FALSE

FALSE

 

Short-stories

TRUE

TRUE

TRUE

         

Mean words per sentence

Adventure

TRUE

TRUE

TRUE

 

Detective-mystery

FALSE

FALSE

FALSE

 

Fiction

TRUE

TRUE

FALSE

 

Historical-fiction

FALSE

FALSE

FALSE

 

Love-story

FALSE

FALSE

FALSE

 

Poetry

FALSE

FALSE

FALSE

 

Sci-fi

FALSE

FALSE

FALSE

 

Short-stories

FALSE

FALSE

TRUE

Whereas readability was consistent across the different approaches the LIWC categories shows a lot more variety.

Tone has the most success across this. As before the 2013 and 2018 data tend to match (but not always, as with Clout or Dictionary words) and 3,000 words, well, it does its own thing.

Tone most consistent throughout and as last time had most significant categories even with 3k.

Parts of speech tags (PoS) with the largest difference

The tables list the top 3 PoS that dominate in successful and unsuccessful books.

Penn data

Successful PoS 2013 Successful PoS 2018 Successful PoS 3k
INN – Preposition / Conjunction INN – Preposition / Conjunction INN – Preposition / Conjunction
DET – Determiner DET – Determiner DET – Determiner
NNS – Noun, plural NNS – Noun, plural NNS – Noun, plural
     
Unsuccessful PoS 2013 Unsuccessful PoS 2018 Unsuccessful PoS 3k
PRP – Determiner, possessive second PRP – Determiner, possessive second RB – Adverb
RB – Adverb VB – Verb, infinitive PRP – Determiner, possessive second
VB – Verb, infinitive RB – Adverb VB – Verb, infinitive

LIWC data

Successful PoS 2013 Successful PoS 2018 Successful PoS 3k
functional – Total function words  functional – Functional words functional – Total function words 
prep –   Prepositions  prep –   Prepositions  prep –   Prepositions 
article –   Articles  space –   Space  article –   Articles 
     
Unsuccessful PoS 2013 Unsuccessful PoS 2018 Unsuccessful PoS 3k
quote –    Quotation marks  allpunc – All Punctuation* ​ adj –   Common adjectives 
allpunc – All Punctuation* ​ affect – Affective processes  adverb –   Common Adverbs 
affect – Affective processes  posemo –   Positive emotion  affect – Affective processes 

The same tags dominate all the books in the Penn treebank for successful books – prepositions (for, of, although, that), determiners (this, each, some) and plural nouns (women, books).

For unsuccessful books it also has determiners that dominate but in the possessive second person (mine yours), adverbs (often, not, very, here) and infinitive verbs (take, live).

For LIWC it is quite similar. Functional words dominate with (it, to, no, very ), prepositions also dominate successful books (to, with, above is its examples) and articles (a, an, the) and (it, to, no, very).

For unsuccessful books it’s all punctuation, quotation marks and social (mate, talk, they while including all family references) and affective processes (happy, cried), which includes all emotional terms.

Quotations suggest a high propensity to a high ratio of dialogue to action/description.

What does this tell us?

2013 v 2018 data

Overall there is more similarity than difference in the 2013 and 2018 Penn and readability results. The machine learning performance was also broadly the same, with each other’s overall performance falling within the 95% confidence interval.  

The most successful PoS were also largely the same, as were the top 3 unsuccessful ones.

Likewise the LIWC categories generally matched in significance for both 2013 and 2018 data. The Successful PoS were broadly the same, as were the unsuccessful ones.

This suggests that while the original authors didn’t mention that the data was only from the previous 30 days, their results have largely stood to be true.

The first chapter

Just judging a book by its first 3,000 words was not as accurate as analysing the whole book. The machine learning performance was barely better than a guess. 

However, the readability did match and the dominance of  successful PoS was similar to that of the full data in the 2013 and 2018 studies.

Of all the LIWC categories described in part 3, Tone both was the most significant predictor across genres but also the most consistent across the different tests.

Summary

The 2018 results generally matches the 2013 results and as such suggest the original method holds as a good predictor of success or failure of those books.

The first 3,000 words results did not match the 2013 or 2018 data and as its machine learning performance was the weakest suggests that this is not an accurate way to predict a book’s success. It may be that there is a ‘sweet spot’ where the first x amount of words correlates closely with the overall rating, but it is more than 3,000 words.

Successful books tend to use prepositions, determiner and nouns and functional words. Unsuccessful ones skew towards quotations marks, punctuation and positive emotions (which with the LIWC are similar to affective processes).

This suggests that unsuccessful books may use shorter sentences (high punctuation rate), more dialogue (high quotation mark rate), adverbs and are more emotional, particularly positive emotions. Writers are frequently told by writing experts to avoid adverbs wherever possible.

Successful books by contrast tend to focus on the action – describing scenes and situations, hence the dominance of functional words, prepositions and articles. This makes them sound rather boring, but suggests that these bread and butter words are necessary to build a good story.

The LIWC data suggests that tone is the most reliable predictor of success. But what isn’t answered whether it is because it predominates in successful or unsuccessful books and whether it is positive or negative emotions. This is something to explore though based on the emotion and affect appearing in the top 3 of unsuccessful books suggests it is there.

Having punctuation tags had some use and machine learning performance was better with it so even though the punctuation tags can be hard to interpret, it is worth including them in any machine analysis but more work is needed to interpret them.

Categories
News Scientific Research

Scrivener: the best tool for organising user research

User research involves a lot of, well, research; a lot of notes, documents, videos, pictures, post its and more. And they all need organising.

There’s no one solution for the problem of what to do with all this, but after a bit of experimentation I find that using Scrivener has been the best for me for keeping things organised.

Scrivener is often seen as a writing tool, but it’s more than a word processor. Yes, it is a writing tool – from word processing to screenplays – but it is also an organiser. Most important it’s very simple to use, and has more advance features for those who want them.

Scrivener being used for user research
Scrivener lets you display folders and multiple documents at once

Renaming research in Scrivener

I’ve been using Scrivener for years, and coming from an anthropological and journalist background to user research I focus research that’s written up – observations, interviews, transcripts. But I also add photos, plan card sorts, organise thoughts with the card index display, and add spreadsheets, PDFs and presentations. Even if I don’t read the presentations directly in there, being able to search all relevant work in one search helps.

In Scrivener I like how easy it is to organise and rename documents, or duplicate them. Compared with doing this in Finder or Explorer, it is much less of a faff. Likewise documents open immediately rather than take a few seconds in Word or Google Drive (and often aren’t the one I want anyway).

While I still use Google Drive and Dropbox and to organise files, particularly video, due to the amount of research that is pure words, either as transcripts, proposals, documents or insights, I find that Scrivener is the best way to keep it all together.

Tables

I love tables. I like maths, I like spreadsheets. Really.

I like to organise interview questions in tables and use a Dewey-esque numbering system to help reorganise them. So question 101 is the first, but perhaps it needs to come later, so I reorganise it as 103 and sort.

Likewise when reviewing a transcript I like to have each question in its own cell with thoughts and insights in the cell next to it.

Scrivener could be friendlier with tables – don’t create one at the end of a page or you’ll never get out, and I always have to customise it. But once I created a good, blank table I could copy and paste that.

Sort code Quote Observation
101 I’m not really sure that it’s appropriate User not keen on this
102 Do I really have to give you a dummy quote? Prefers to be in control of speech
250 At this time, a friend shall lose his friend’s hammer and the young shall not know where lieth the things possessed by their fathers Likes Brian?

Good things about using Scrivener for user research

What’s great:

  • Easy to move documents around and organise into folders and rename them
  • Split view makes reviewing transcripts and images easy
  • Colour and icon coding makes it easy to find key files
  • Compiling documents means you can make it consistent output, or just select the ones you need to put into a single PDF or Word report, or output as multiple documents so you don’t have to worry about formatting until the end
  • Coding for things such as image captions means that you don’t have problems with Word getting confused about auto-numbers
  • Text file syncing – if out in the field you can create text notes and sync them automatically into the project 
  • Great search tool for searching titles or entire files
  • Corkboard views to organise thoughts, observations, insights etc
  • Good way to have a list of priorities and hierarchies
  • Importing documents automatically works pretty well, just drag and drop the Word docs to where you want them and it’ll convert them into a continuous webpage rather than multi page report

What’s not so great:

  • No dictation tool
  • Not always the best way to view documents and tables
  • No Android version, although there is one for iOS, although it’s rare that you need the entire project on ⁃ your phone
  • Adding weblink – it already fills in the https:// part but every time you copy and paste from Chrome it has that part, so you get ‘broken’ links as it’s https://https:// if you forget to remove that part
  • Can be fiddly with bullets

User research tools to support Scrivener

OneNote, which isn’t free, is good for:

  • Transcripts – jump to the audio where your notes are as it tracks your writing with recording (although only 15min recording on Android for some unknown reason). It can convert speech to text, though I find that’s a bit less reliable.
  • Optical character recognition – it’s not 100% accurate but it’s good enough for recognising text from images and these will be show in search
  • Syncs across devices

I also use Trello to track research questions, answers and insights.

Overall Scrivener with its files synced through the cloud (Dropbox, OneDrive etc) has been great for keeping track of research. Scrivener isn’t free, but I feel I got my $45 worth of use long ago, and it’s less than what Microsoft charges for Office 365 (which includes OneNote).

Scrivener hasn’t sponsored or otherwise provided incentives for me to write this (nor has Microsoft, though I’d feel weird if they did), I just want to spread the word for a useful tool.

Categories
Scientific Research Writing

Scraping, screenplays and sexism

In the past couple of days there have been two big data posts that analyses sex and screenplays.

Polygraph’s Hannah Anderson and Matt Daniels scraped and analysed 2,000 screenplays and their dialogue to get data on the division of dialogue according to sex, age and other factors.

The Economist looked at data from USC Annenberg on nudity and ‘sexualised attire’ (aka revealing outfits and the like) in film, along with lead and speaking roles by sex.

script-analysed

Getting screenplay data

Both reports focused on presenting the data and key thoughts rather than delving too deep into interpretation. Analysing Hollywood is a complex business – like William Goldman said “nobody knows anything” when it comes to predicting success, let alone Hollywood and sexism.

The main thing of interest for me is the methods of analysing screenplays. Matt has a long and detailed method with links to script sources, along with the code on Github and a list of where he got the data from.

Potential uses

Both studies used data to explore issues around gender and films, but there is further potential with the data. For example:

  • emotion and sentiment – not a fan due to the drawbacks but possible to trace emotion in scripts, looking at such things as whether beginning, middle or ends are more or less emotional and is there a pattern
  • the split of action and dialogue in a script – do successful scripts have a divide (aka an avoidance of walls of text)
  • are women more confident or not – an extension of their sexism report, but it could be a question of whether female characters tend to ask more characters (or use emotional language)
  • writing level – what is the typical readability for the dialogue of heroes and villains, along with scripts in general and how does this vary by genre (would The Imitation Game or A Beautiful Mind be more difficult to read, let alone film, than Die Hard?)
  • is good writing important in a successful script – as with the study of readability, does having too many adverbs and other things that Hemingway hates hinder scripts
  • statistical significance – as Matt acknowledges, there are no statistical tests in their report, what tests could be done

Why we need this data

Maybe nothing comes out, but there is no harm in trying and while I never expect any rules to come out (Goldman is already laughing) but perhaps some very broad principles could emerge from the data. Even a finding of nothing can be something to report. The only pity is that due to grey areas of scraping we’d have to start from scratch rather than use the script data the teams have already used.

But it will be worth it and we can get away from what the Polygraph article calls “all rhetoric and no data, which gets us nowhere in terms of having an informed discussion.”

In the meantime if you want to search the data you can either check out the links or use the Polygraph tool here.

Categories
Scientific Research

Season finales: which shows went out in style?

Season finales, the last show in a series, the end of an era… when a TV programme comes to an end (or season, depending on where you are) there’s a high expectation the writers will make it a classic.

This isn’t always the case. The Sopranos became notorious for its unclear ending of whether the main character, Tony Soprano, died or not. On the other hand, Breaking Bad‘s ending, which resolved the fate of Walter White, Jessie Pinkman and the others, won rave reviews.

Best season endings

The reason I used those two examples is that both The Sopranos and Breaking Bad were generally and consistently well-reviewed, so the endings had a high expectation of being of equally good (and ideally better) quality. Yet how do they compare to other series?

Two Reddit users, PhJulien and ChallengeResponse,  have done something clever I wish I thought of — getting the data from IMDB and comparing finales with average ratings. IMDB not only lists every episodes but also collects user ratings. More importantly, it lets you get at its data.

Here’s what they found.

Finales that topped the series

Series finales that topped or bombed - via /u/ChallengeResponse/Imgur
Series finales that topped or bombed (click image for full size) – via /u/ChallengeResponse/Imgur

What ChallengeResponse did was write a Python program to get the data and then made a chart ranking the difference between the average rating and the finale rating. He’s ranked this by the biggest difference, so that Glee, the school where the singing never stops, which got around a 6.8 average, had a finale with 9.2. I’m reading the charts for these numbers so may be off, but that’s a difference of 2.4 rating points.

At the other end, Dexter, the show about the serial-killer killer, caused a stink with viewers, dropping from its average of 8.9 out of 10 to 4.8 in the finale, a drop of 4.1 rating points.

Another, earlier, way to look at this is through PhJulien’s chart, which scatters average rating to finale.

Series average ratings plotted against finale rating
Series average ratings plotted against finale rating – via /u/PhJulien/Imgur

Looking at it this way, Breaking Bad, which had an extremely good average of 9.0 for its series as a whole, went out with a 9.9. So a good show went out almost perfectly, according to public rating the show on IMDB.

Looking at it this way the majority of shows go out roughly a little better than average (which is what viewers want).

Would this work with British TV?

No British show is in PhJulien’s chart, and only one in ChallengeResponse’s data – The Office (its US version is in PhJulien’s).

Could I repeat this? Yes, but the difference is that US shows offer a much bigger sample size — the US version of The Office ran to 201 episodes, the UK version to just 12 and 3 specials.

When you’re basing data on such small samples it gets a bit trickier, not least because the average for the finale is included in the series’ overall rating. That’s not a problem when the final episode is 1 out of 201, or 0.5% of all episodes and ratings, while the final of the UK version accounts for 7% of all ratings.

Could I try this? Yes, but I think the findings are too risky. Still, it’s a great idea and one that could be used in other data reviews.

Do it yourself

You can get all ChallengeResponses charts and more (ranked by finale, season average and alphabetically) at Imgur.

He also includes the links to doing it yourself by using IMDbPY and how he visualised it in iPython using matplotlib.

You can get the source code for iPython notebook on GitHub.

Categories
Scientific Research

Explaining the news: is Vox top?

There are thousands of news sites out there. But what if there was a way to find out which site is best for giving you a good overview of news stories.

I’ve analysed newspapers before, but this is different. At the recent News Impact Summit (NIS), I heard an interesting talk by the engagement manager for Vox, a newish online news site. But a couple of things seemed off with what we were told.

Vox’s spokeswoman told us that her site’s goal is to set itself apart from other news sites by explaining the news, and to do so in a shareable way. She showed examples of ‘cards’ (Vox’s way of displaying content explaining the news), but I couldn’t help but have a problem with their examples of ‘easy-to-understand’ content. With only 6 sentences shown on screen, at least 2 of them were longer than 30 words*.

At GOV.UK, where I’m currently freelancing, we wouldn’t have that. No not at all, for user research shows that anything over 25 words is a reading killer. Similarly, we’re told to avoid unnecessary or uncommon words, such as the “hence” that started sentences in the Vox examples.

On the other hand, Vox’s spokeswoman told us it put tremendous effort into polishing headlines to make more readers want to click.

Me being me, this prompted me to wonder what the truth is.

Man reading a website as a paper

Skip a section

If you want to skip the method and background you can go to:

Comparing the news

I selected several of the top news stories of the past year, ones that Vox had a ‘card’ for:

  • the ebola outbreak
  • Islamic State
  • Malaysian Airways MH17 downing over Ukraine
  • the Ukraine crisis
  • Michael Brown shooting and rioting in Ferguson, USA

I wanted to look at comparable news sources. This doesn’t just mean news sites. I looked at a combination of the most popular news sites in the world (that I could access without subscription), along with other ways we get news. So even though BuzzFeed and Reddit aren’t in the top 10 news sites, they are significant news source for many. I then divided these into new and old media.

‘Old media’ (organisations established before the internet):

  • BBC News — the UK’s most popular news site. Most of its articles are written by its own journalists
  • The Daily Mail — the world’s most popular news site and, unlike the New York Times, I can access its articles. It uses Associated Press articles along with its own
  • The Guardian — another globally popular website but one that aims to be a bit more highbrow than the Mail. Has many guest authors
  • The Economist — though not as popular, like Vox it seeks to explain the news and not just report it. No author bylines, all articles conform to one style

‘New media’ (organisations set up since the internet became popular):

  • Huffington Post — like Vox, this is a ‘new media’ site and very popular. It too has a range of guest authors
  • BuzzFeed — journalists love to spoof its hyperbolic headlines, but it’s increasingly popular, particularly on Facebook, and its UK editor was interesting at the NIS
  • Reddit — a social site with a range of topics. I looked at its ‘Explain like I’m 5’ sub-reddit (thread) for ‘simplified and layman-accessible explanations’
  • Vox — US news site that features both news stories and more in-depth explanations through its ‘cards’

Not every site had a good summary or explainer, while some had more than 1. You can see the full list of articles here.

Using various analytic tools, including readability analysis programs, word counts, my own splitting, and the LIWC word analysis tool, I ran the articles through several analyses.

What I expected to find and why

Vox says it spends a lot of effort perfecting the headline. Good, for I found in previous research that a good headline — descriptive, inviting, optimised — is vital for getting readers to click.

However, nothing was mentioned of polishing Vox’s content. To be fair to the speaker, she wasn’t a writer, so she may not have had the information. Yet this meant my expectation was that the headlines would be polished but the content could ramble (and not be readable).

As for the other sources… The Guardian is a ‘high brow’ paper so would probably be the least readable of the major sources. The Economist is also high brow but it takes the view that authors should never assume too much prior knowledge of its readers. As a subscriber I listen to its audio edition and the language flows. Like the BBC then, as a media firm that has a ‘spoken word service’ (so to speak) this helps focus on good readability.

The Daily Mail, however, is so popular that it must appeal to the lowest common denominator — easy reading. The Huffington Post was my main uncertainty — I don’t read it, and going by my social networks, no one else seems to (at least in the UK). But a quick looks shows that it has a lot of authors and no set tone.

Finally there are two of the newest sites — Reddit and BuzzFeed. BuzzFeed is a joke to many journalists (sorry BuzzFeed staff reading this). But at the NIS, the site more (in)famous for headlines like “Can You Make It Through This Post Without Feeling Sexually Attracted to Food? ” and “Emoji Facts That Will Make You 🙂” and its ilk seemed to be getting the last laugh. Its UK news editor got the respect, if grudging, from the more senior hacks there.

In part it’s because BuzzFeed is going beyond cat pictures to do more serious reports. Readers are coming for the memes but staying for the news.

Reddit is slightly different to all the others on this source. It’s a glorified messageboard — anyone can ask a question, anyone can answer. Other users can vote on questions and answers, and as it’s so popular it has a wide range of users, from experts to the average internet commenter. Thanks to the voting of the ‘best’ queries and answers I’ve often found good, clear explanations that go beyond the news article it’s linked to. In particular, the Explain Like I’m 5 sub-Reddit (thread) is dedicated to explaining complex issues (not just the news) and ideas in simple ways.

Results

Data processing

Headlines

Headline complexity

A good headline will give enough detail to describe, but leave enough out to make the reader want to find out more. Today’s readers are presented with so many headlines on a news site’s homepage, let alone their social and other sites, that it’s vital that headlines stand out. One way of doing that is making sure they actually understand (or have a good guess) of what the headline will link to.

I couldn’t measure whether something was clickbait (ie, content doesn’t match the title), and I find headlines are too short to run a readability analysis. Instead I found complexity of words as a good proxy, where a ‘complex word’ is any with 3 or more syllables. In other words, long words.

Though not perfect, it does give us an idea of how snappy a headline is. I didn’t look at length because in this day of search engine optimisation, and on my previous research, I didn’t find a good correlation between clicks and length.

A good headline then should be long enough to capture the story and capture the reader — no more, no less.

Most complex is the Guardian, followed by BuzzFeed (well it does like words like ‘unbelievable’ and ‘amazing’). Vox, by contrast, has fairly snappy headlines (“11 things you need to know about Ebola”), as do the other new media sites, Reddit and the Huffington Post.

Headline categorisation

While it’s hard to gauge content, the LIWC can give some idea of what the headline is about based on word categories.

Vox says it’s there to explain the news and it does have a high proportion of insight words (“think”, “know”). The Guardian, by contrast, has more causation words (“because”). Now there’s a subtle difference between causation and insight. My view is that words classed as “insight” are more fact-based (“this is what happened”) whereas insight is more about opinion (“this is why this thing happened”). Both give you an overview, but causation suggests that it’s opinion-led.

This is a subtle distinction but if this is true suggests that the Guardian (and Huffington Post) are likely to have the more opinionated authors, those who (claim to) know the answer. By contrast, Vox, like the BBC, is more neutral, focusing on the facts.

For touchy feely types, Reddit is about the senses (“We’ve been hearing about ebola…”). Of course the main difference of Reddit with the others is that the question (or headline) in this case is posed by one user and answered by others. This will result in varying questioning styles and answers.

Body copy

Let’s go from the headlines now into the meat of the content. So far Vox seems to be doing what it stated — explaining in a fairly neutral way what’s happening, with fairly polished headlines.

Readability

There are different ways to score how easy it is to read an article. These are based on looking at sentence length, complexity (number of syllables) and other factors.

Averaging the outputs I came up with a score, where, like golf, the higher the number the ‘worse’ it is.

As with headlines, the Guardian insists on being complex. Yet Vox isn’t that far off, being the next most complex, in line with my expectations based on those long sentences and non-plain words.

By contrast the BBC is a lot less complex. I did include one article aimed at children on CBBC, but this had a similar readability score to the main BBC News article. The Daily Mail also keeps its writing less complex. Like the BBC it has a broad readership and as such can’t afford to be too complex.

Let’s dig a bit deeper and look at other reasons why the Guardian and others are so complex.

Sentence length distribution

I looked at sentence length partly because of this quote on the GOV.UK blog:

Writing guru Ann Wylie describes research showing that when average sentence length is 14 words, readers understand more than 90% of what they’re reading. At 43 words, comprehension drops to less than 10%.

Cumulative here just means that I keep adding the total in one category to the next. So BuzzFeed has 24% of its sentences in ‘9 and fewer words’, and 51% (24%+27% for ’10-14 words’) fewer than 14 words. The Guardian by contrast (yet again) only has 12% of its sentence as short as 9 words.

Looking at the curves you can see that BuzzFeed has short, punchy sentences and so its curve is steep and peaks early. The Guardian, with long, word sentences, gently curves out as it rambles on. Vox is between the two. That can be a good middle path. Short sentences aren’t always best. They can be distracting.

This method isn’t perfect but with enough data it does give a good indicator — BuzzFeed’s sentences are likely to be understood by more people than the Guardian’s. And Vox’s.

Long sentence split

There’s another way of looking at  sentence length — what’s the overall split between complex and short sentences?

BuzzFeed really stands out for its snappiness, while a 1/3 of the Guardian’s sentences are classed as long. Ouch.

Yet despite having a good readability score, the Daily Mail has sentence length proportions approaching the Guardian’s. We need to find out more.

Adverb use

I believe the road to hell is paved with adverbs, and I will shout it from the rooftops.

Stephen King is just one of many authors and style-guide setters who rail against the adverb, seeing it as a sign of poor writing. Adverbs modify verbs, such as “he quickly walked”. A good writer would generally (and this is a generalism, as there is debate) use a single word than add an adverb. For example, rather than “quickly walked”, they’d use “darted”, “dashed” and so on (as long as the single word is still plain English).

As such I use adverb count as a rough measure of how good the writing is. It can also be seen as how good the sub-editing process is (if any, sad to say), balanced against the need to let an author’s voice be heard.

Reddit has the highest use of adverbs. Not surprising — users aren’t professional writers nor do they have a sub-editor. I’d be surprised if the authors themselves even spent time editing their work. And that’s to be expected, as Reddit is ultimately a messageboard, not a professional publication.

I was surprised at the amount of adverbs in the Huffington Post and the Guardian. Having had the chance to ask a former Guardian sub I was told that the paper, while keen to maintain its style, doesn’t want to mask the author’s voice. With many authors not professional writers (and, being news, they have a short time to compose their material) it’s no wonder that adverbs are allowed.

The BBC, by contrast, is in no rush to break news, nor does not have many guest columnists but instead has professional journalists write most of its content. The Economist is a weekly newspaper so has that increasingly rare luxury of time — time to let writers review and subs to sub. It also aims to have a single, consistent style and voice.

This doesn’t explain the Daily Mail, which sits there, in the middle. But of the 3 articles analysed, 2 were by the Associated Press, which tends to go for a neutral style (unlike the Mail).

Subjects and pronoun use

Finally let’s look at who the authors address and how much of this is personal experience.

Now I’ve not accounted for quotations in this, which by their nature are personal experiences and need attributing (he says).

As before, Reddit as the more social of the news sources lead the way with the personal “I”. And with the question being set by another user, it’s natural to respond to them with “you”. I was surprised that the Huffington Post had a similar proportion, but I wasn’t surprised that the traditional news sources lack the first person.

What can this tell us? GOV.UK tells its writers to address its subjects as “you”, though I couldn’t find the research to say why this is best. As a writer it does feel more personal using “you” but can’t  say why it’s better to the reader. My research on this at Which?, where I had Google Analytics and Omniture data, didn’t lead to any conclusions about user behaviour and the best form of addressing readers.

Instead it’s more as interest to see how the split goes between different organisations, and the divide between the old and new media.

Passive voice

Style guides warn against the use of passive voice and encourage the active voice (ie, “Freddie Starr ate my hamster”, not “A hamster was eaten by Freddie Starr”).

The BBC, the bastion of impartial and neutral news, and so is the most passive (“it was claimed”). A noble idea, but not always as readable. Vox at the other end is the most direct (“Russia denies it is invading”) along with the Mail and Guardian. BuzzFeed doesn’t do as well here (“Authorities in these nations have scrambled to contain the disease”), but its short sentences seem to carry its overall readability.

Summary

Looking at how easy it is to understand a headline, the new media (Vox, Reddit and Huffington Post) win the day. Their headlines were the most polished and appealing to readers, and state clearly that they’ll explain the news.

The new media sites, with the exception of BuzzFeed (“11 Things You Need To Know About The Ebola Epidemic That’s Killing Thousands”), had less complex headlines. Not to say that this meant short headlines — search words have to be crammed it — but shorter words were generally used.

The best overall readability was for the BBC, but in terms of sentence lengths BuzzFeed kept it short and punchy throughout. The Guardian however had long headlines and long sentences, hurting its chances of being widely understood by a wide demographic.

Other observations

Several sites had topic pages, eg the Huffington Post’s MH17 topic, while few had summaries like Vox’s cards or the BBC’s explainers . Topics are pages that collect all pages related to a news story could be found. Yet when I tried to use them to find the ‘best’ page or a summary it was a barren search. Instead it seemed more as a technical solution (grouping similar content) to a technical problem, but not an editorial answer. I preferred the Vox style of an editorial collection summing up the situation.

I ignored images, which the Daily Mail and BuzzFeed have a large number of. I don’t know how this may affect readability. When it comes to online content I’m with Alice (of Wonderland fame), who tired of writing that lacks pictures. I don’t know what effect this has on readership, though I know that images benefit search engine optimisation.

Finally, I didn’t look at overall word length as this would be unfair on Vox. Though this is a good indicator of readability, the way Vox arranged its content meant its multiple pages would count as one according to the analysis programs.

Conclusion

Breaking news

Does all this matter? News sources cater to different audiences so if the Guardian wants a reader base that has to put in a bit of effort to understand what it’s trying to say, then that’s the Guardian’s choice. Me, I prefer to keep things plain.

I also wonder whether complex readability hurts the Guardian’s influence — if readers aren’t clear what’s being said then how can the paper have a great influence? How many people enjoy struggling through an article? If there’s a good point to be made, let alone a tricky question to answer, why make it hard to understand.

I have no beef with Vox. It’s interesting what they’re doing and I single them out because they presented a statement to a room of journalists and it’s a journalist’s job to challenge. But compared with newspapers that have already been explaining the news for years, such as the Economist, it has much to learn. It wasn’t surprising then to hear that Vox was set up by bloggers. Blogging is a different beast to journalism, though as shown by Vox’s rapid rise, it has benefits for grabbing online readers.

So in answer to the question in the headline — is Vox top? The answer follows Betteridge’s law — no. Vox has good headlines but its content is so dense that it is unlikely to attract the broad demographic it apparently aims for.

Instead I see BuzzFeed continuing its success due to its easy-to-read sentences (and so be readable by the widest audience). Yet in contrast to its copy, BuzzFeed’s headlines were long, though at least they described the article.

Yet a quick revisit to Vox showed a different story. While headlines to the explanatory cards in Vox were well written, the news headlines caused a bit of headache when we looked at them. “Europe’s leaders have succeeded in making Greece unimportant” had to be read a couple of times to get the meaning. I wasn’t even sure what I’d get when I clicked on that headline.

Is there a best site, as stated at the start of this article? Horses for courses, but to avoid weasling out, I’d say that the BBC seems to strike the best balance between them, while at the more sensational end BuzzFeed is best. Reddit can be good, but I’d prefer to monitor its news summing up before giving a better answer.

Next time

If I did this again I’d also want to look at:

  • passive voice proportions through a new tool — I don’t like the passive voice analysis in here so would want a second opinion
  • verb phrases per sentence, apparently a better predictor of readability — this would mean building a new analysis tool
  • more data — bigger is better, but I couldn’t/didn’t scrape this time as it would have taken longer than doing it manually

Predictions

I don’t have the traffic data for any of the sites I analysed. Reddit is probably the closest as it gives a score. Of course if anyone working at those sites wants to send me any data I’d gratefully receive it…

Even with this lack of data, I’d still expect:

  • BBC — slower off the mark with news stories as it spends longer polishing them, so it’s worst for breaking news, but it’s the easiest to comprehend news source. Will continue to be a go-to news site of choice, but its CBBC news for children needs to be simplified. If traffic is good its ‘explainers’ may become more popular
  • The Daily Mail — with only one article written by the Mail it’s hard to give a unique distinction for it, but those selected were easy enough to read. Will remain a global news souce
  • The Guardian — plodding headlines and plodding pieces mean that if articles are read, I’m not sure how much will truly be retained and understood. I wonder how many readers skip straight to the comments. While those who understand it seem to love it, its high reading comprehension means its demographics will be much narrower most of the other news sources in this study
  • The Economist — in many ways what Vox is aiming for, each article assumes no prior knowledge and it’ll remain my go-to newspaper for news summaries. If only its headlines were a little more descriptive and it sentences a bit more active, it may become more popular than it is
  • Huffington Post — it’ll continue to be stuck in the middle ground, neither new or old media, it’s both too impersonal and too distant so occupies this niche. A niche that’s not enticing to me
  • BuzzFeed — I expect a reasonable click-through rate for its headlines but as its articles are easy to read users are likely to share them and to read more of them. Expect the news site to grow in popularity. I’m guessing its complex headlines serve its purposes, and I’d be interested to see what testing they’ve done on them
  • Reddit — it has users who address other users, don’t expect a polished (or any response) but can give an easy to digest understanding of the situation (if the article exists). Surprising amount of experts on there, from Arnold Schwarzenegger to research scientists. Its future depends on its readers (which reminds me of something)
  • Vox — it will draw readers in with a good click-through for headlines but will have a high bounce (‘exit) rate and low click-through for the next page due to its hard-to-read format
  • topic pages on news sites — unless top/relevant/best posts are pinned to the top these will mainly serve as useful pages for the authors but too garbled to use for the average reader

*I blame GOV.UK for being able to spot a complex sentence and counting it in a Rain Man-esque manner

Categories
Scientific Research

Better writing measured

We say as writers that we can make writing better, but how can we measure this?

You can use editorial authority, or user research, but I wanted to use a way that was simple to analyse, could be done by anyone, and could justify the work we’d been doing.

Categories
Scientific Research

The clockwork internet

A little (okay, a long) while back I described how I planned to analyse newspaper columnists in order to find out if there was a variation between newspapers in terms of tone.

Well, I have finally done this, or at least got the bulk of it out the way – 12,000 UK newspaper editorials and columnist articles from the past decade of and counting. And it was a lot simpler than I thought, thanks to the way the internet works – and you can do it too.