Categories
Research Writing

“Success with Style” part 3: using LIWC data

Last time we replicated the Success with Style original output and methods despite it not being listed. We managed to get the data to broadly match. Great, but now we are going to look at a different way of analysing the same text.

In part 2 we used the Penn treebank to analyse the text and its parts of speech (PoS). This time we’re using LIWC, a tool developed at the University of Texas. It has similarities to the Penn treebank in that it categorises words and has similar categories, such as prepositions.

In part 1 we looked at the original experiment and recreated it in part 2. This time we’ll use the same input data but process it through a different NLP analysis program — the LIWC.

Hypotheses

H0: There's no difference in the proportion of LIWC categories in successful and unsuccessful books, regardless of genre
HA: There is a difference in the proportion of LIWC categories in successful and unsuccessful books, and the pattern will depend on genre

H0: There's no difference in the LIWC summary values of successful and unsuccessful books, regardless of the book's genre
HB: There is a difference in the LIWC summary values of successful and unsuccessful books, and the pattern will depend on genre

 

Success with Style LIWCMethod

The data was the same, the measure of success and the method was the same as in part 1, along with adjust the p-value (p<0.05 for significance) and machine learning algorithm. Likewise variables with many zeroes were not transformed.

Difference in success

The R code managed to create different tags to the original. You can find the LIWC definitions at the foot of this page.

Tags per genre

LIWC Difference in proportion function-article – original data

Overall biggest difference

PoS (successful books) Definition Diff (largest difference first) PoS (Unsuccessful books) Definition Diff (largest difference first)
functional Total function words 0.003835 quote Quotation marks -0.001814
prep Prepositions 0.001758 allpunc All Punctuation* ​ -0.001350
article Articles 0.001199 affect Affective processes -0.001231
ipron Impersonal pronouns 0.001198 social Social processes -0.001181
space Space 0.001155 posemo Positive emotion -0.001103
relativ Relativity 0.000860 ppron Personal pronouns -0.001047
number Numbers 0.000623 apostro Apostrophes -0.000999
focuspast Past focus 0.000463 female Female references -0.000963
power Power 0.000454 focuspresent Present focus -0.000929
cogproc Cognitive processes 0.000437 shehe 3rd pers singular -0.000905
period Periods/fullstop 0.000403 verb Common verbs -0.000642
comma Commas 0.000379 informal Informal language -0.000361
differ Differentiation 0.000369 exclam Exclamation marks -0.000323
otherp Other punctuation 0.000318 time Time -0.000319
parenth Parentheses (pairs) 0.000266 you 2nd person -0.000273
conj Conjunctions 0.000266 percept Perceptual processes -0.000236
quant Quantifiers 0.000257 affiliation Affiliation -0.000216
semic Semicolons 0.000254 focusfuture Future focus -0.000213
interrog Interrogatives 0.000233 sad Sadness -0.000202
colon Colons 0.000225 adj Common adjectives -0.000190
work Work 0.000197 family Family -0.000190
drives Drives 0.000163 nonflu Nonfluencies -0.000156
pronoun Total pronouns 0.000154 netspeak Netspeak -0.000154
cause Causation 0.000136 discrep Discrepancy -0.000140
anger Anger 0.000131 see See -0.000133
we 1st pers plural 0.000130 bio Biological processes -0.000130
certain Certainty 0.000125 i 1st pers singular -0.000121
compare 0.000125 negemo Negative emotion -0.000111
they 0.000122 body Body -0.000104
death 0.000101 reward Reward -0.000098
tentat 0.000078 friend Friends -0.000088
ingest 0.000060 risk Risk -0.000080
home 0.000055 negate Negations -0.000073
achieve 0.000038 auxverb Auxiliary verbs -0.000070
money 0.000016 motion Motion -0.000069
health 0.000011 insight Insight -0.000067
adverb 0.000011 hear Hear -0.000056
leisure 0.000003 feel Feel -0.000049
swear 0.000002 assent Assent -0.000046
male Male references -0.000045
qmark Question marks -0.000035
sexual Sexual -0.000028
anx Anxiety -0.000025
dash Dashes -0.000025
relig Religion -0.000010
filler Fillers -0.000008

A positive (negative) value means that the mean PoS proportion is higher in the more (less) successful books

Unpaired t-tests

Showing results of PoS tags that have significant adjusted P-values.

PoS Definition adjusted P-value
analytic Analytical thinking 0.017
tone Emotional tone 0
mWoSen Mean Words per Sentence 0
sixletter Six letter words 0
ppron Personal pronouns 0.005
ipron Impersonal pronouns 0
article Articles 0.005
prep Prepositions 0
adj Common adjectives 0.005
number Numbers 0
affect Affective processes 0
posemo Positive emotion 0
negemo Negative emotion 0.045
sad Sadness 0.009
social Social processes 0.044
family Family 0.041
friend Friends 0
female Female references 0.026
feel Feel 0.041
bio Biological processes 0.044
affiliation Affiliation 0.017
power Power 0.017
risk Risk 0.017
focuspresent Present focus 0.02
focusfuture Future focus 0
space Space 0.009
time Time 0
informal Informal language 0
nonflu Nonfluencies 0
colon Colons 0.028
exclam Exclamation marks 0
quote Quotation marks 0.005
apostro Apostrophes 0.017

33 out of 93 tags (including punctuation) of the transformed PoS were significantly different between successful and unsuccessful books. This mean that we can reject the null hypothesis (hypothesis 1) since the proportion of more than 1 PoS was significantly different between more and less successful books.

Difference in LIWC summary variables

The LIWC has its own definitions. Some of them are proprietary so how they’re calculated is not clear, but they rely on the PoS tags. For example, ‘tone’ is overall emotion (both the positive and negative emotion tags). Like the tags, they use the proportion (ie 0.85 means 85% of the text) in a text apart from mean words per sentence.

Variables Definition
Analytical thinking (Analytic) People low in analytical thinking tend to write and think using language that is more narrative ways, focusing on the here-and-now, and personal experiences. Those high in analytical thinking perform better in college and have higher college board scores.
Clout Clout refers to the relative social status, confidence, or leadership that people display through their writing or talking. The algorithm was developed based on the results from a series of studies where people were interacting with one another.
Authenticity When people reveal themselves in an authentic or honest way, they are more personal, humble, and vulnerable.
Emotional tone (Tone) Although LIWC2015 includes both positive emotion and negative emotion dimensions, the Tone variable puts the two dimensions into a single summary variable. Numbers below 50 suggest a more negative emotional tone.
Measure Successful Unsuccessful P value Significant (p>0.05)?
Six letter words 0.1633 0.1552 0.0004 TRUE
Mean words per sentence 18.3832 17.0184 0.0007 TRUE
Dictionary words 0.8388 0.8410 0.6000 FALSE
Authentic 0.2240 0.2181 0.3900 FALSE
Analytic 0.7240 0.6939 0.0032 TRUE
Clout 0.7417 0.7499 0.3800 FALSE
Tone 0.3892 0.4486 0.0010 TRUE

Results show that the mean words per sentence were significantly different in successful books and comparable to the figures in the original test. Likewise the proportion of six letter words (or more) is significantly different in successful books. The tone however is lower in successful ones (ie uses fewer emotional words either positive or negative).

Looking further at these categories by genre:

Difference in analytical words (scaled and normalized) between more and less successful books
Difference in authenticity (scaled and normalized) between more and less successful books
Difference in clout (scaled and normalized) between more and less successful books
Difference in clout (scaled and normalized) between more and less successful books
Difference in Dictionary Words (scaled and normalized) between more and less successful books
Difference in Dictionary Words (scaled and normalized) between more and less successful books
Difference in mean words per sentence (scaled and normalized) between more and less successful books
Difference in mean words per sentence (scaled and normalized) between more and less successful books
Difference in proportion of 6 letter words (scaled and normalized) between more and less successful books
Difference in proportion of 6 letter words (scaled and normalized) between more and less successful books
Difference in tone (scaled and normalized) between more and less successful books
Difference in tone (scaled and normalized) between more and less successful books

Most important variables

PoS Definition Overall relative importance
ipron Impersonal pronouns 100.00
quote Quotation marks 86.40
otherp Other punctuation 69.99
posemo Positive emotion 68.88
time Time 67.30
space Space 64.90
parenth Parentheses (pairs) 58.40
you 2nd person 56.80
adj Common adjectives 46.73
risk Risk 41.25
sixletter Six letter words 40.70
semic Semicolons 38.60
power Power 35.29
netspeak Netspeak 31.52
number Numbers 30.08
swear Swear words 28.03
period Periods/fullstop 27.75
filler Fillers 25.91
certain Certainty 25.69
death Death 25.56
mWoSen Mean words per sentence 25.03
ppron Personal pronouns 22.95
colon Colons 20.12
focuspast Past focus 19.99
body Body 18.78
tone Emotional tone 18.57
leisure Leisure 17.86
focusfuture Future focus 16.08
home Home 14.88
exclam Exclamation marks 13.08
achieve Achievement 11.90
dicWo Dictionary words 11.72
apostro Apostrophes 9.99
work Work 9.22
ingest Ingestion 7.70
health Health 6.83
relig Religion 5.91
qmark Question marks 3.93
interrog Interrogatives 2.72
hear Hear 1.48

Machine learning performance

Accuracy 95% CI Sensitivity Specificity
75.00% 67.6%-81.5% 76% 74%

Conclusion

  • The mean proportion of 33 PoS tags were significantly different between more successful and less successful books (reject null hypothesis 1)
  • Six letter word proportion, mean words per sentence, analytical words and tone were significantly different between more and less successful books (reject null hypothesis 2). Between these categories all genres except historical fiction had a significant difference, with tone (ie both positive and negative emotion use) being significant for 5 out of the 8 genres. No category in the Penn treebank analysis had this many significant genres.
  • Six letter words, Mean words per sentence, Dictionary words, Authentic, Analytic, Clout, and Tone can be used to predict the status of the book with an accuracy reaching 75%. This is superior to the readability, mean words per sentence and mean syllables per word score of 65%. 

Overall LIWC analysis has performed better than using readability and Penn treebank analysis.

LIWC definitions

These are taken from the LIWC manual.

Abbreviation Category Examples
WC Word count ­
Summary Language Variables
Analytic Analytical thinking ­
Clout Clout ­
Authentic Authentic ­
Tone Emotional tone ­
WPS Words/sentence ­
Sixltr Words > 6 letters ­
Dic Dictionary words ­
Linguistic Dimensions
funct Total function words it, to, no, very
pronoun Total pronouns I, them, itself
ppron Personal pronouns I, them, her
i 1st pers singular I, me, mine
we 1st pers plural we, us, our
you 2nd person you, your, thou
shehe 3rd pers singular she, her, him
they 3rd pers plural they, their, they’d
ipron Impersonal pronouns it, it’s, those
article Articles a, an, the
prep Prepositions to, with, above
auxverb Auxiliary verbs am, will, have
adverb Common Adverbs very, really
conj Conjunctions and, but, whereas
negate Negations no, not, never
Other Grammar
verb Common verbs eat, come, carry
adj Common adjectives free, happy, long
compare Comparisons greater, best, after
interrog Interrogatives how, when, what
number Numbers second, thousand
quant Quantifiers few, many, much
Psychological Processes
affect Affective processes happy, cried
posemo Positive emotion love, nice, sweet
negemo Negative emotion hurt, ugly, nasty
anx Anxiety worried, fearful
anger Anger hate, kill, annoyed
sad Sadness crying, grief, sad
social Social processes mate, talk, they
family Family daughter, dad, aunt
friend Friends buddy, neighbor
female Female references girl, her, mom
male Male references boy, his, dad
cogproc Cognitive processes cause, know, ought
insight Insight think, know
cause Causation because, effect
discrep Discrepancy should, would
tentat Tentative maybe, perhaps
certain Certainty always, never
differ Differentiation hasn’t, but, else
percept Perceptual processes look, heard, feeling
see See view, saw, seen
hear Hear listen, hearing
feel Feel feels, touch
bio Biological processes eat, blood, pain
body Body cheek, hands, spit
health Health clinic, flu, pill
sexual Sexual horny, love, incest
ingest Ingestion dish, eat, pizza
drives Drives
affiliation Affiliation ally, friend, social
achieve Achievement win, success, better
power Power superior, bully
reward Reward take, prize, benefit
risk Risk danger, doubt
TimeOrient Time orientations
focuspast Past focus ago, did, talked
focuspresent Present focus today, is, now
focusfuture Future focus may, will, soon
relativ Relativity area, bend, exit
motion Motion arrive, car, go
space Space down, in, thin
time Time end, until, season
Personal concerns
work Work job, majors, xerox
leisure Leisure cook, chat, movie
home Home kitchen, landlord
money Money audit, cash, owe
relig Religion altar, church
death Death bury, coffin, kill
informal Informal language
swear Swear words fuck, damn, shit
netspeak Netspeak btw, lol, thx
assent Assent agree, OK, yes
nonflu Nonfluencies er, hm, umm
filler Fillers Imean, youknow
allpunc All Punctuation* ​
period Periods/fullstop .
comma Commas ,
colon Colons :
semic Semicolons ;
qmark Question marks ?
exclam Exclamation marks !
dash Dashes
quote Quotation marks apostro Apostrophes parenth Parentheses (pairs) ()otherp Other punctuation
Categories
Research Writing

An Agile writers’ room: a better way of writing part 2

Last time we looked at the problem around writing and how too few individuals can write well enough consistently to reach the top. But together they may stand a better chance, and Agile methodology would be the way to do this.

That’s quite an assumption, but Agile (in all its forms, more on that later) is geared to testing and adaptation so the best thing is to plan how that would work and try it out in reality.

Agile writing room

Writing for publication is Waterfall but should it be Agile?

Agile is about working as a team to produce something together. Very idealistic, but doesn’t Waterfall and its related methodologies do the same?

The main difference is that Agile is not about working to produce one big, final, perfect result. Instead Agile is about breaking it down into small units, delivering the minimum needed in short sprints, testing, refining and adapting.

Agile v waterfall
Waterfall compared with Agile (via Agilenutshell)

This doesn’t mean Waterfall is bad, it suits big things where you can’t test, or update or move things. Things such as building projects… and writing? Certainly when I’ve written professionally or creatively it’s been comparable to this – set deadline, some editing and peer feedback then submit your best and forget about it once done.

This makes sense at first – if you’re aiming for a deadline you must produce your best and it must be complete and on time. Yet content teams are switching away from this in the non-creative sector due to the benefit of breaking things down into bits. And you can also break the team roles down into bits and split it between members.

The Agile writing team

As the roles are split you’ll need people who can do all these things working together, feeding back and being aware of what others are doing. A mantra of Agile is that the unit of delivery is the team. The best Agile teams may not have the best at their individual skills, the best developer, but it will have the best at working together to deliver what they need to.

You can be brilliant at your role but if you can’t work with others and adapt to help with them then you can’t write in an Agile team.

So writers are all you need in a writing team, right? Yes, of course you can’t have a writing team without writers, but you need more.

Here’s a table looking at the skills you’d need in an Agile writing team and how it’d map to a writers’ room. The roles aren’t all that different in many cases, it’d be how they work together that is. This is a big reduction, writing and Agile teams vary etc, I’ve taken liberties in both the writers room and Agile team for illustration.

Role Agile Writing teams
Deals with the vision and the bigger picture. Works with stakeholders. Decides on priorities and making decisions. Keeps the team informed of priorities. They work with the backlog and decide making deacons in a timely manner. Provide information in timely manner. Product owner (aka on-site customer or active stakeholder) Executive Producer Showrunner (depends on the team)
Create the right environment. They remove blockers and work with the product owner to make the vision happen. Doer of the visionary pairing. Delivery manager/scrum master Problem solver, project management, but not technical planning and scheduling as that is left to the team Works to hire the team Has a range of skills to do things properly Very practical person Co-producers Showrunner Writers assistant can help with some of the lower level tasks
Creator Content designer, developer Writers (story editors, staff writers etc)
Researches what the user needs, identifies the users User researcher Writers assistant (if asked by writer
Testing and stretch exercises Team develops this themselves Team develops this themselves
Specialists with knowledge brought on for key parts Technical or domain experts with specialist technical knowledge Consulting producer
Testers Independent test team, user researchers External editor Readers
Anyone who is a direct user, indirect user, manager or users, senior managers, staff member. “Gold owner” who funds the project. Representatives of the customer. Stakeholders (funder/commissioner) Executive producers, studio

Differences are many though. In Agile because it’s the team that’s responsible for delivery they are also collectively responsible for accepting work, allocating it and are responsible for producing it.

So while the show runner has editorial job, they are less of the tyrant of imaginings, but in return for this loss of control it should allow for a gain in innovation.

An example of how it works

Agile has already transformed other creative ways of working. I’ve mentioned government a lot but other areas have changed too, such as marketing:

“[Before Agile we didn’t have] a clear focus of our tasks and communicating them as a team […] Now, before the start of each quarter we’d meet and decide what our team priorities would be, then each team member would be assigned to the priorities and off we’d go. We’d meet two mornings a week to discuss the progress of our priorities, our KPIs, and our blockers.”

Which Agile do I mean?

Agile experts reading this probably long ago asked this question even though I said I’d look at the general principles. The main 3 forms of Agile are as the Harvard Business Review states:

  • scrum, which emphasises creative and adaptive teamwork in solving complex problems
  • lean development, which focuses on the continual elimination of waste
  • kanban, which concentrates on reducing lead times and the amount of work in process

My straw poll of Agile experts is that kanban would be a good way to start, as it’s about reduce the amount of work. But the beauty of Agile is that it can be adapted as needed.

Team writing in Agile is not for everyone for various reason.For instance, everyone needs to own a ticket. This responsibility is not for everyone. Consistency will be tricky. That is one for Agile to answer through the doing – there may not be a market, people may be afraid of ‘idea theft’ (not that that is really an issue). It may be less agile and more plodding.

Final thought: Agile writers, over complicating things?

It’s a fair question – is this overly complicated? My only defence is the William Goldman view of Hollywood – if, as he says, “no one knows anything” then who’s to say they know it won’t work?

Hollywood and TV (which this would be about writing scripts for) would be receptive to anything as long as it gets results. More and more places, including Amazon Studios, accept unsolicited scripts and only care if they tell a good story.

What they want is writers who can meet a specification on time, make changes as requested (and not be too difficult about pushing back) and do it on time.

From my time at BBC the thing that came up again and again when people asked “how does that person keep getting hired” was that while they may at worst be accused of mediocre scripts, they were never bad, they met the brief and most important of all, they were on time.

That’s not too high a bar to hit.

Next steps

Theory is one thing but it’s nothing without putting into action.

That’s what the plan is. It’ll be hard to get going – would this be voluntary or would I hire people; I have a breakdown of resources but will that work in practice?

So many questions, but the only way to answer them is not to speculate but to try.

Be prepared, be prepared to fail, but most importantly be prepared to learn to and to develop from that. Success in terms of the project is that it even works and we complete an initial script. Surely we can do that?

Categories
Scientific Research Writing

Scraping, screenplays and sexism

In the past couple of days there have been two big data posts that analyses sex and screenplays.

Polygraph’s Hannah Anderson and Matt Daniels scraped and analysed 2,000 screenplays and their dialogue to get data on the division of dialogue according to sex, age and other factors.

The Economist looked at data from USC Annenberg on nudity and ‘sexualised attire’ (aka revealing outfits and the like) in film, along with lead and speaking roles by sex.

script-analysed

Getting screenplay data

Both reports focused on presenting the data and key thoughts rather than delving too deep into interpretation. Analysing Hollywood is a complex business – like William Goldman said “nobody knows anything” when it comes to predicting success, let alone Hollywood and sexism.

The main thing of interest for me is the methods of analysing screenplays. Matt has a long and detailed method with links to script sources, along with the code on Github and a list of where he got the data from.

Potential uses

Both studies used data to explore issues around gender and films, but there is further potential with the data. For example:

  • emotion and sentiment – not a fan due to the drawbacks but possible to trace emotion in scripts, looking at such things as whether beginning, middle or ends are more or less emotional and is there a pattern
  • the split of action and dialogue in a script – do successful scripts have a divide (aka an avoidance of walls of text)
  • are women more confident or not – an extension of their sexism report, but it could be a question of whether female characters tend to ask more characters (or use emotional language)
  • writing level – what is the typical readability for the dialogue of heroes and villains, along with scripts in general and how does this vary by genre (would The Imitation Game or A Beautiful Mind be more difficult to read, let alone film, than Die Hard?)
  • is good writing important in a successful script – as with the study of readability, does having too many adverbs and other things that Hemingway hates hinder scripts
  • statistical significance – as Matt acknowledges, there are no statistical tests in their report, what tests could be done

Why we need this data

Maybe nothing comes out, but there is no harm in trying and while I never expect any rules to come out (Goldman is already laughing) but perhaps some very broad principles could emerge from the data. Even a finding of nothing can be something to report. The only pity is that due to grey areas of scraping we’d have to start from scratch rather than use the script data the teams have already used.

But it will be worth it and we can get away from what the Polygraph article calls “all rhetoric and no data, which gets us nowhere in terms of having an informed discussion.”

In the meantime if you want to search the data you can either check out the links or use the Polygraph tool here.

Categories
Research Writing

Dressing your characters

Describing your character’s dress and appearance can be the sign of poor writing taste – but not if you do it in the right context, as a Harvard Business School study has just confirmed.

When writing a story, having a character know what the norms are and being able to conform or break them, and how others react to this, can help a story. While some dress differently “to communicate that they are different or worthy of attention”, the exact effects have been found in a psychological study.

And it led to interesting results relevant to writers.

Categories
Opinion Writing

Writing the easy option

Samantha Brick’s case shows it’s too easy to take the easy option when writing – and that’s what’s happening more and more. But is her pretty face the future of journalism?

I’ve worried for some time that writing is becoming slacker and it’s why I set up this site, and I still keep meaning to publish my research into change over time at some point as it needs a fair bit of testing.

So Shu Richmond’s criticism at her site rang true when I read it – journalism is becoming shoddy. But unlike Shu I am an optimist.

Categories
Writing

Site of the week: Script Consultant

Professional level writing is tricky so it’s no surprise there is a glut of sites claiming to help make scriptwriters into pros. For a price.

This is why Script Consultant is the site of the week. Philip Shelley is a freelance script consultant, and yes he charges for pitching and other advice – after all, he has to make a living – but he gives a lot away free of charge.

For example, his latest article is on how to come up with ideas, full of top tips such as reading professional scrtipts – sounds obvious but I’m surprised at the amount of (wannabe) writers who barely look at successful examples of the finished product before putting ink to paper.

His regular email newsletter also has a wealth of advice and you can sign up now from his home page – so what are you waiting for?

Categories
Writing

Mad for opera?

Ever considered writing an opera? I don’t think there are many showing their hands (or likes, follows or whatever we’re meant to do in the 2.0th century).

Perhaps it’s due to this lack of supply that has prompted the English National Opera to launch Mini Operas, a competition to find new opera-writing talent.