Last time we replicated the Success with Style original output and methods despite it not being listed. We managed to get the data to broadly match. Great, but now we are going to look at a different way of analysing the same text.
In part 2 we used the Penn treebank to analyse the text and its parts of speech (PoS). This time we’re using LIWC, a tool developed at the University of Texas. It has similarities to the Penn treebank in that it categorises words and has similar categories, such as prepositions.
In part 1 we looked at the original experiment and recreated it in part 2. This time we’ll use the same input data but process it through a different NLP analysis program — the LIWC.
Hypotheses
H0: There's no difference in the proportion of LIWC categories in successful and unsuccessful books, regardless of genre
HA: There is a difference in the proportion of LIWC categories in successful and unsuccessful books, and the pattern will depend on genre
H0: There's no difference in the LIWC summary values of successful and unsuccessful books, regardless of the book's genre
HB: There is a difference in the LIWC summary values of successful and unsuccessful books, and the pattern will depend on genre
Method
The data was the same, the measure of success and the method was the same as in part 1, along with adjust the p-value (p<0.05 for significance) and machine learning algorithm. Likewise variables with many zeroes were not transformed.
Difference in success
The R code managed to create different tags to the original. You can find the LIWC definitions at the foot of this page.
Tags per genre

Overall biggest difference
PoS (successful books) | Definition | Diff (largest difference first) | PoS (Unsuccessful books) | Definition | Diff (largest difference first) |
---|---|---|---|---|---|
functional | Total function words | 0.003835 | quote | Quotation marks | -0.001814 |
prep | Prepositions | 0.001758 | allpunc | All Punctuation* | -0.001350 |
article | Articles | 0.001199 | affect | Affective processes | -0.001231 |
ipron | Impersonal pronouns | 0.001198 | social | Social processes | -0.001181 |
space | Space | 0.001155 | posemo | Positive emotion | -0.001103 |
relativ | Relativity | 0.000860 | ppron | Personal pronouns | -0.001047 |
number | Numbers | 0.000623 | apostro | Apostrophes | -0.000999 |
focuspast | Past focus | 0.000463 | female | Female references | -0.000963 |
power | Power | 0.000454 | focuspresent | Present focus | -0.000929 |
cogproc | Cognitive processes | 0.000437 | shehe | 3rd pers singular | -0.000905 |
period | Periods/fullstop | 0.000403 | verb | Common verbs | -0.000642 |
comma | Commas | 0.000379 | informal | Informal language | -0.000361 |
differ | Differentiation | 0.000369 | exclam | Exclamation marks | -0.000323 |
otherp | Other punctuation | 0.000318 | time | Time | -0.000319 |
parenth | Parentheses (pairs) | 0.000266 | you | 2nd person | -0.000273 |
conj | Conjunctions | 0.000266 | percept | Perceptual processes | -0.000236 |
quant | Quantifiers | 0.000257 | affiliation | Affiliation | -0.000216 |
semic | Semicolons | 0.000254 | focusfuture | Future focus | -0.000213 |
interrog | Interrogatives | 0.000233 | sad | Sadness | -0.000202 |
colon | Colons | 0.000225 | adj | Common adjectives | -0.000190 |
work | Work | 0.000197 | family | Family | -0.000190 |
drives | Drives | 0.000163 | nonflu | Nonfluencies | -0.000156 |
pronoun | Total pronouns | 0.000154 | netspeak | Netspeak | -0.000154 |
cause | Causation | 0.000136 | discrep | Discrepancy | -0.000140 |
anger | Anger | 0.000131 | see | See | -0.000133 |
we | 1st pers plural | 0.000130 | bio | Biological processes | -0.000130 |
certain | Certainty | 0.000125 | i | 1st pers singular | -0.000121 |
compare | 0.000125 | negemo | Negative emotion | -0.000111 | |
they | 0.000122 | body | Body | -0.000104 | |
death | 0.000101 | reward | Reward | -0.000098 | |
tentat | 0.000078 | friend | Friends | -0.000088 | |
ingest | 0.000060 | risk | Risk | -0.000080 | |
home | 0.000055 | negate | Negations | -0.000073 | |
achieve | 0.000038 | auxverb | Auxiliary verbs | -0.000070 | |
money | 0.000016 | motion | Motion | -0.000069 | |
health | 0.000011 | insight | Insight | -0.000067 | |
adverb | 0.000011 | hear | Hear | -0.000056 | |
leisure | 0.000003 | feel | Feel | -0.000049 | |
swear | 0.000002 | assent | Assent | -0.000046 | |
male | Male references | -0.000045 | |||
qmark | Question marks | -0.000035 | |||
sexual | Sexual | -0.000028 | |||
anx | Anxiety | -0.000025 | |||
dash | Dashes | -0.000025 | |||
relig | Religion | -0.000010 | |||
filler | Fillers | -0.000008 |
A positive (negative) value means that the mean PoS proportion is higher in the more (less) successful books
Unpaired t-tests
Showing results of PoS tags that have significant adjusted P-values.
PoS | Definition | adjusted P-value |
---|---|---|
analytic | Analytical thinking | 0.017 |
tone | Emotional tone | 0 |
mWoSen | Mean Words per Sentence | 0 |
sixletter | Six letter words | 0 |
ppron | Personal pronouns | 0.005 |
ipron | Impersonal pronouns | 0 |
article | Articles | 0.005 |
prep | Prepositions | 0 |
adj | Common adjectives | 0.005 |
number | Numbers | 0 |
affect | Affective processes | 0 |
posemo | Positive emotion | 0 |
negemo | Negative emotion | 0.045 |
sad | Sadness | 0.009 |
social | Social processes | 0.044 |
family | Family | 0.041 |
friend | Friends | 0 |
female | Female references | 0.026 |
feel | Feel | 0.041 |
bio | Biological processes | 0.044 |
affiliation | Affiliation | 0.017 |
power | Power | 0.017 |
risk | Risk | 0.017 |
focuspresent | Present focus | 0.02 |
focusfuture | Future focus | 0 |
space | Space | 0.009 |
time | Time | 0 |
informal | Informal language | 0 |
nonflu | Nonfluencies | 0 |
colon | Colons | 0.028 |
exclam | Exclamation marks | 0 |
quote | Quotation marks | 0.005 |
apostro | Apostrophes | 0.017 |
33 out of 93 tags (including punctuation) of the transformed PoS were significantly different between successful and unsuccessful books. This mean that we can reject the null hypothesis (hypothesis 1) since the proportion of more than 1 PoS was significantly different between more and less successful books.
Difference in LIWC summary variables
The LIWC has its own definitions. Some of them are proprietary so how they’re calculated is not clear, but they rely on the PoS tags. For example, ‘tone’ is overall emotion (both the positive and negative emotion tags). Like the tags, they use the proportion (ie 0.85 means 85% of the text) in a text apart from mean words per sentence.
Variables | Definition |
---|---|
Analytical thinking (Analytic) | People low in analytical thinking tend to write and think using language that is more narrative ways, focusing on the here-and-now, and personal experiences. Those high in analytical thinking perform better in college and have higher college board scores. |
Clout | Clout refers to the relative social status, confidence, or leadership that people display through their writing or talking. The algorithm was developed based on the results from a series of studies where people were interacting with one another. |
Authenticity | When people reveal themselves in an authentic or honest way, they are more personal, humble, and vulnerable. |
Emotional tone (Tone) | Although LIWC2015 includes both positive emotion and negative emotion dimensions, the Tone variable puts the two dimensions into a single summary variable. Numbers below 50 suggest a more negative emotional tone. |
Measure | Successful | Unsuccessful | P value | Significant (p>0.05)? |
---|---|---|---|---|
Six letter words | 0.1633 | 0.1552 | 0.0004 | TRUE |
Mean words per sentence | 18.3832 | 17.0184 | 0.0007 | TRUE |
Dictionary words | 0.8388 | 0.8410 | 0.6000 | FALSE |
Authentic | 0.2240 | 0.2181 | 0.3900 | FALSE |
Analytic | 0.7240 | 0.6939 | 0.0032 | TRUE |
Clout | 0.7417 | 0.7499 | 0.3800 | FALSE |
Tone | 0.3892 | 0.4486 | 0.0010 | TRUE |
Results show that the mean words per sentence were significantly different in successful books and comparable to the figures in the original test. Likewise the proportion of six letter words (or more) is significantly different in successful books. The tone however is lower in successful ones (ie uses fewer emotional words either positive or negative).
Looking further at these categories by genre:







Most important variables
PoS | Definition | Overall relative importance |
---|---|---|
ipron | Impersonal pronouns | 100.00 |
quote | Quotation marks | 86.40 |
otherp | Other punctuation | 69.99 |
posemo | Positive emotion | 68.88 |
time | Time | 67.30 |
space | Space | 64.90 |
parenth | Parentheses (pairs) | 58.40 |
you | 2nd person | 56.80 |
adj | Common adjectives | 46.73 |
risk | Risk | 41.25 |
sixletter | Six letter words | 40.70 |
semic | Semicolons | 38.60 |
power | Power | 35.29 |
netspeak | Netspeak | 31.52 |
number | Numbers | 30.08 |
swear | Swear words | 28.03 |
period | Periods/fullstop | 27.75 |
filler | Fillers | 25.91 |
certain | Certainty | 25.69 |
death | Death | 25.56 |
mWoSen | Mean words per sentence | 25.03 |
ppron | Personal pronouns | 22.95 |
colon | Colons | 20.12 |
focuspast | Past focus | 19.99 |
body | Body | 18.78 |
tone | Emotional tone | 18.57 |
leisure | Leisure | 17.86 |
focusfuture | Future focus | 16.08 |
home | Home | 14.88 |
exclam | Exclamation marks | 13.08 |
achieve | Achievement | 11.90 |
dicWo | Dictionary words | 11.72 |
apostro | Apostrophes | 9.99 |
work | Work | 9.22 |
ingest | Ingestion | 7.70 |
health | Health | 6.83 |
relig | Religion | 5.91 |
qmark | Question marks | 3.93 |
interrog | Interrogatives | 2.72 |
hear | Hear | 1.48 |
Machine learning performance
Accuracy | 95% CI | Sensitivity | Specificity |
---|---|---|---|
75.00% | 67.6%-81.5% | 76% | 74% |
Conclusion
- The mean proportion of 33 PoS tags were significantly different between more successful and less successful books (reject null hypothesis 1)
- Six letter word proportion, mean words per sentence, analytical words and tone were significantly different between more and less successful books (reject null hypothesis 2). Between these categories all genres except historical fiction had a significant difference, with tone (ie both positive and negative emotion use) being significant for 5 out of the 8 genres. No category in the Penn treebank analysis had this many significant genres.
- Six letter words, Mean words per sentence, Dictionary words, Authentic, Analytic, Clout, and Tone can be used to predict the status of the book with an accuracy reaching 75%. This is superior to the readability, mean words per sentence and mean syllables per word score of 65%.
Overall LIWC analysis has performed better than using readability and Penn treebank analysis.
LIWC definitions
These are taken from the LIWC manual.
Abbreviation | Category | Examples |
---|---|---|
WC | Word count | |
Summary Language Variables | ||
Analytic | Analytical thinking | |
Clout | Clout | |
Authentic | Authentic | |
Tone | Emotional tone | |
WPS | Words/sentence | |
Sixltr | Words > 6 letters | |
Dic | Dictionary words | |
Linguistic Dimensions | ||
funct | Total function words | it, to, no, very |
pronoun | Total pronouns | I, them, itself |
ppron | Personal pronouns | I, them, her |
i | 1st pers singular | I, me, mine |
we | 1st pers plural | we, us, our |
you | 2nd person | you, your, thou |
shehe | 3rd pers singular | she, her, him |
they | 3rd pers plural | they, their, they’d |
ipron | Impersonal pronouns | it, it’s, those |
article | Articles | a, an, the |
prep | Prepositions | to, with, above |
auxverb | Auxiliary verbs | am, will, have |
adverb | Common Adverbs | very, really |
conj | Conjunctions | and, but, whereas |
negate | Negations | no, not, never |
Other Grammar | ||
verb | Common verbs | eat, come, carry |
adj | Common adjectives | free, happy, long |
compare | Comparisons | greater, best, after |
interrog | Interrogatives | how, when, what |
number | Numbers | second, thousand |
quant | Quantifiers | few, many, much |
Psychological Processes | ||
affect | Affective processes | happy, cried |
posemo | Positive emotion | love, nice, sweet |
negemo | Negative emotion | hurt, ugly, nasty |
anx | Anxiety | worried, fearful |
anger | Anger | hate, kill, annoyed |
sad | Sadness | crying, grief, sad |
social | Social processes | mate, talk, they |
family | Family | daughter, dad, aunt |
friend | Friends | buddy, neighbor |
female | Female references | girl, her, mom |
male | Male references | boy, his, dad |
cogproc | Cognitive processes | cause, know, ought |
insight | Insight | think, know |
cause | Causation | because, effect |
discrep | Discrepancy | should, would |
tentat | Tentative | maybe, perhaps |
certain | Certainty | always, never |
differ | Differentiation | hasn’t, but, else |
percept | Perceptual processes | look, heard, feeling |
see | See | view, saw, seen |
hear | Hear | listen, hearing |
feel | Feel | feels, touch |
bio | Biological processes | eat, blood, pain |
body | Body | cheek, hands, spit |
health | Health | clinic, flu, pill |
sexual | Sexual | horny, love, incest |
ingest | Ingestion | dish, eat, pizza |
drives | Drives | |
affiliation | Affiliation | ally, friend, social |
achieve | Achievement | win, success, better |
power | Power | superior, bully |
reward | Reward | take, prize, benefit |
risk | Risk | danger, doubt |
TimeOrient | Time orientations | |
focuspast | Past focus | ago, did, talked |
focuspresent | Present focus | today, is, now |
focusfuture | Future focus | may, will, soon |
relativ | Relativity | area, bend, exit |
motion | Motion | arrive, car, go |
space | Space | down, in, thin |
time | Time | end, until, season |
Personal concerns | ||
work | Work | job, majors, xerox |
leisure | Leisure | cook, chat, movie |
home | Home | kitchen, landlord |
money | Money | audit, cash, owe |
relig | Religion | altar, church |
death | Death | bury, coffin, kill |
informal | Informal language | |
swear | Swear words | fuck, damn, shit |
netspeak | Netspeak | btw, lol, thx |
assent | Assent | agree, OK, yes |
nonflu | Nonfluencies | er, hm, umm |
filler | Fillers | Imean, youknow |
allpunc | All Punctuation* | |
period | Periods/fullstop | . |
comma | Commas | , |
colon | Colons | : |
semic | Semicolons | ; |
qmark | Question marks | ? |
exclam | Exclamation marks | ! |
dash | Dashes | – |
quote | Quotation marks | apostro Apostrophes parenth Parentheses (pairs) ()otherp Other punctuation |