Daily Mail v The Guardian: equally angry?

This week two British media giants, the Daily Mail and the Guardian, got into an inter-title fight about who encourages hate and negativity.

The Press Gazette best sums up the story, which started when the Guardian implied that the Mail and Sun are to blame for the recent attack on a mosque.

The Guardian published a cartoon of a white van outside Finsbury Park mosque, where one person was killed, with ‘Read the Sun and the Daily Mail’ on the vehicle. The Mail took this as implying that it incited the attacker to kill Muslims and fumed, replying with the editorial “Fake news, the fascist Left and the REAL purveyors of hatred”.

In short, both sides accuse the other of peddling noxious opinions, and in particular the Daily Mail effectively says that the Guardian can get off its high horse as its views are just as noxious. Are they?

The Mail has a point

Yes, the Daily Mail has a point. While the Guardian may not typically have immigrants, saboteurs or judges as targets of its wrath, it does similarly emotive language in descriptions of its enemies (usually tories).

What it comes down to is the Mail says that the Guardian’s views may be left politically, but they are just as negative as the Mail claims the Guardian thinks it is.

This chart shows the average proportion of ‘anger’ words in the body copy and headlines for 12,000 Mail and Guardian opinion pieces spanning the past couple of decades. They’re not so different in terms of the average about of anger and negative words they use in body copy and headlines, and use more on average than other British newspapers.

Negative newspapers?

In 2013 I analysed 60,000 opinion columns from 6 British newspapers — the Daily Express, Mail, Independent, Mirror, Guardian and Telegraph — for a range of measures. This included sentiment, and emotional proportions within text, using the LIWC 2007.

I was looking at a range of things, including the question of whether the internet had changed the way newspapers wrote — would they become more emotional to target their niches. I chose opinion columns for I took it that an opinion column — editorials, those written by regular as well as guest columnists and commentators — was the most suitable way to see what a paper really thinks as opposed to reporting a news event.

I split the headlines and body copy out as headlines are often written separately to the body, and can also give an idea of what phrasing the paper thinks will draw readers’ attention.

At the time I vowed to publish each week. I didn’t in the end, in part as I saw no market and in part I was looking around if someone was interested in publishing, and while I got some interest, it was a case of “what does this lead to”? This is what it leads to.

Whenever there are two colours, blue is the body copy, red is the headline. Y-axis is the proportion of content meeting that definition. Or just hover over the images for the legend to appear.

Average negative emotion in headlines and body for all newspapers

The following charts make it clearer, but there is a definitive difference between newspapers and their negativity, and a similarity between the Mail and Guardian.

Average anger in headlines and body for all newspapers

The Guardian has angrier content, on average, than the Mail – 0.884 v 0.839.

Most negative content

The Daily Mail is the most negative, but the Guardian isn’t far behind.

Angriest headlines and body (split out)

The Daily Mail has the angriest headlines, but not the angriest content — that’s the Guardian



Positive message

The Mirror is overall the most positive, although the Guardian is slightly more positive in its message than the Mail.

Negative emotions in headlines and body over time

Before 2006 I have less data, which may explain the variation (and is why the other charts are based on data from 2008 onwards), but while headlines change in tone, the body copy has largely been consistent. Zoom in to 1 or 2-year views and there’s no large change over the months, not even at Christmas.

Change in negativity over time for all papers

ALl newspapers have largely been consistent over the years. I had been expecting them to become more emotional as they strive to distinguish themselves on the internet.

Mail change in negativity over time

Love it or hate it, the Mail has largely stuck to its tone over the years, perhaps a little more negative of late.

Guardian change in negativity over time

As with the Mail, the Guardian has been roughly consistent in its tone.

Word count over time

This is the only chart that shows a real change over time. Many style guides for online suggest keeping the body length short (something I ought to be better at) and you can see that as the internet becomes more important for revenue around 2005 the length shortens.

Why creep up again? Honest answer, I don’t know, but it could be a suspicion that people are so quick to move onto another article that it doesn’t matter whether it was long or not — if the reader likes it, they’ll stick to the end, regardless of the length (within reason). Or it could be my data set.

The Daily Mail v Mail Online

Part of the beef the Daily Mail has is that it accuses the Guardian of confusing MailOnline with the Daily Mail and I use ‘the Mail’ in general terms partly due to reasons in this article. As such I can’t guarantee the data solely contains Daily Mail rather than MailOnline articles (they are apparently separately companies though both owned by DMGT), though if I reviewed it I probably could.

End thoughts

I should carry out significance tests, but for a quick and dirty evaluation (if 60,000 articles can be seen as that) it serves a point — that the Mail isn’t as wrong as many would like to think.

As this former journalist says, the Daily Mail isn’t all bad and this wasn’t published to bash it. In fact it was the Guardian accusing others of being so hateful that spurred me onto this data research back in the day.

What can both papers learn? I’ve not seen their sales, link shares and page views or other closed data as that would be the best way to see if there was a correlation between tone and readership. But they can both learn that while the topics of their wrath, their readership, their font, their style, all differ, there are more similarities than some would be comfortable with.

Contact me if you want the data of nearly 60,000 articles, including 5,200 from the Mail and 7,200 Guardian, or go to Google Drive, buy you must attribute if you use it.

Scientific Research Writing

Scraping, screenplays and sexism

In the past couple of days there have been two big data posts that analyses sex and screenplays.

Polygraph’s Hannah Anderson and Matt Daniels scraped and analysed 2,000 screenplays and their dialogue to get data on the division of dialogue according to sex, age and other factors.

The Economist looked at data from USC Annenberg on nudity and ‘sexualised attire’ (aka revealing outfits and the like) in film, along with lead and speaking roles by sex.


Getting screenplay data

Both reports focused on presenting the data and key thoughts rather than delving too deep into interpretation. Analysing Hollywood is a complex business – like William Goldman said “nobody knows anything” when it comes to predicting success, let alone Hollywood and sexism.

The main thing of interest for me is the methods of analysing screenplays. Matt has a long and detailed method with links to script sources, along with the code on Github and a list of where he got the data from.

Potential uses

Both studies used data to explore issues around gender and films, but there is further potential with the data. For example:

  • emotion and sentiment – not a fan due to the drawbacks but possible to trace emotion in scripts, looking at such things as whether beginning, middle or ends are more or less emotional and is there a pattern
  • the split of action and dialogue in a script – do successful scripts have a divide (aka an avoidance of walls of text)
  • are women more confident or not – an extension of their sexism report, but it could be a question of whether female characters tend to ask more characters (or use emotional language)
  • writing level – what is the typical readability for the dialogue of heroes and villains, along with scripts in general and how does this vary by genre (would The Imitation Game or A Beautiful Mind be more difficult to read, let alone film, than Die Hard?)
  • is good writing important in a successful script – as with the study of readability, does having too many adverbs and other things that Hemingway hates hinder scripts
  • statistical significance – as Matt acknowledges, there are no statistical tests in their report, what tests could be done

Why we need this data

Maybe nothing comes out, but there is no harm in trying and while I never expect any rules to come out (Goldman is already laughing) but perhaps some very broad principles could emerge from the data. Even a finding of nothing can be something to report. The only pity is that due to grey areas of scraping we’d have to start from scratch rather than use the script data the teams have already used.

But it will be worth it and we can get away from what the Polygraph article calls “all rhetoric and no data, which gets us nowhere in terms of having an informed discussion.”

In the meantime if you want to search the data you can either check out the links or use the Polygraph tool here.

Scientific Research

Season finales: which shows went out in style?

Season finales, the last show in a series, the end of an era… when a TV programme comes to an end (or season, depending on where you are) there’s a high expectation the writers will make it a classic.

This isn’t always the case. The Sopranos became notorious for its unclear ending of¬†whether the main character, Tony Soprano, died or not. On the other hand,¬†Breaking Bad‘s ending, which resolved the fate of Walter White, Jessie Pinkman and the others, won rave reviews.

Best season endings

The reason I used those two examples is that both The Sopranos and Breaking Bad were generally and consistently well-reviewed, so the endings had a high expectation of being of equally good (and ideally better) quality. Yet how do they compare to other series?

Two Reddit users, PhJulien¬†and ChallengeResponse, ¬†have done something clever I wish I thought of — getting the data from IMDB and comparing finales with average ratings. IMDB not only lists every episodes but also collects user ratings. More importantly, it lets you get at its data.

Here’s what they found.

Finales that topped the series

Series finales that topped or bombed - via /u/ChallengeResponse/Imgur
Series finales that topped or bombed (click image for full size) – via /u/ChallengeResponse/Imgur

What ChallengeResponse did was write a Python program to get the data and then made a chart ranking the difference between the average rating and the finale rating. He’s ranked this by the biggest difference, so that¬†Glee,¬†the school where the singing never stops, which got around a 6.8 average, had a finale with 9.2. I’m reading the charts for these numbers so may be off, but that’s a difference of 2.4 rating points.

At the other end, Dexter, the show about the serial-killer killer, caused a stink with viewers, dropping from its average of 8.9 out of 10 to 4.8 in the finale, a drop of 4.1 rating points.

Another, earlier, way to look at this is through PhJulien’s chart, which scatters average rating to finale.

Series average ratings plotted against finale rating
Series average ratings plotted against finale rating – via /u/PhJulien/Imgur

Looking at it this way, Breaking Bad, which had an extremely good average of 9.0 for its series as a whole, went out with a 9.9. So a good show went out almost perfectly, according to public rating the show on IMDB.

Looking at it this way the majority of shows go out roughly a little better than average (which is what viewers want).

Would this work with British TV?

No British show is in PhJulien’s chart, and only one in ChallengeResponse’s data –¬†The Office (its US version is in PhJulien’s).

Could I repeat this? Yes, but the difference is that US shows offer a much bigger sample size — the US version of The Office¬†ran to 201 episodes, the UK version to just 12 and 3 specials.

When you’re basing data on such small samples it gets a bit trickier, not least because the average for the finale is included in the series’ overall rating. That’s not a problem when the final episode is 1 out of 201, or 0.5% of all episodes and ratings, while the final of the UK version accounts for 7% of all ratings.

Could I try this? Yes, but I think the findings are too risky. Still, it’s a great idea and one that could be used in other¬†data reviews.

Do it yourself

You can get all ChallengeResponses charts and more (ranked by finale, season average and alphabetically) at Imgur.

He also includes the links to doing it yourself by using IMDbPY and how he visualised it in iPython using matplotlib.

You can get the source code for iPython notebook on GitHub.


Only fools and sitcoms

Can you help discover why some sitcoms are funnier than others?

In a previous post I wrote about my reasons why some sitcoms are funnier than others and developed a theory to test ‚Äď that the best comedies have more laughs per page and more reversals overall.

Every theory needs a test and I’m testing it on the top five British Sitcoms as voted for in 2004, not to find ‘the best’ comedy but to find which¬†episodes¬†are the greatest.

Yet to test it I need your help. I need you to read the scripts, count the funnies and reversals and enter them into the database below. I’ll let you know the results when they’re in ‚Äď subscribe and follow me on Twitter to do so.


Being a better liar

Scientists say that if you want to write fiction you need to be a good liar.

According to Matthew Newman and James Pennebaker of the University of Texas, liars speak and choose their words differently from those telling the truth, and this affects how we write.


Quantifying comedy and the science of sitcoms

Comedy, like politics and religion, can be a big joke, or no laughing matter, depending on your point of view. And like political and religious views, it’s near impossible to convert doubters that your chosen path is the true one.

Knowing what makes a comedy ‘good’ is a dark art – you may have seen the Importance of Being Earnest more times than a maiden aunt but your favourite comedy is American Pie.

Likewise you love Monty Python’s Life of Brian as the greatest comedy ever made but find the troupe’sThe Meaning of Life turgid, yet both are rated as four out of five stars on IMDB.

What then makes a comedy great – how can a writer know what they write is funny, how can you argue your favourite comedy is ‘the best’ when another swears that you’re wrong?