“Success with Style” part 4 — modern data and just a chapter

When starting this analysis I spotted that the download data was for the past 30 days and that this was used for success or fail categorisation.

Even if the data was for the lifetime of the book, it’s been nearly 5 years since the original downloads. The best way to test this then was to get the latest data (albeit still for the past 30 days).

The other thought was that the analyses looked at the entire book. But what if readers did not read the entire book but only read a certain amount before making a judgment? When submitting work to an agent or publisher for consideration, for example, often only the first chapter is requested. Based on this I analysed just the first 3,000 words of each book through the Penn and LIWC tagger and used its 2013 success/fail data to repeat the experiments.

Finally I noticed a bias towards punctuation as markers for success or failure in the output and ran the experiments without the punctuation tags to see what the result would be.

Starting hypotheses

H0: There's no difference in the tests which produce significant results between the 2014 and 2018 data HA: There is a difference in the tests which produce significant results between the 2014 and 2018 data

H0: There's no difference in the tests which produce significant results between the full machine analysis of the book and that of just the first 3,000 words HB: There is a difference in the tests which produce significant results between the full machine analysis of the book and that of just the first 3,000 words

The hypotheses are fairly simple – if there is no difference in the 2018 data then most of the test that proved significant with the 2013 data should also do so in 2018.

Likewise if the first 3,000 words is unimportant the test results should likewise only be significant at the same level.

3,000 words (3k words) is about 10 pages and is about one chapter’s length although of course there is no hard and fast rule about how long a chapter is.

Data used

Data summary

2018 data download date	2018-07-22
2013 data download date	2013-10-23
Unique books used	759

Difference in 2013 and 2018 success rates

Row Labels	Count
FAILURE	22
Adventure	5
Detective/mystery	3
Fiction	2
Historical-fiction	1
Love-story	1
Poetry	8
Short-stories	2
SUCCESS	20
Adventure	3
Detective/mystery	4
Fiction	1
Historical-fiction	4
Love-story	3
Sci-fi	5
Grand Total	42

There were 758 unique books (the remaining 42 of the 800 listed were in multiple categories). With 42 differing that is 5.5% of the total books used and none of those with a different success status was listed in multiple categories.

The new data was parsed through both the Perl Lingua Tagger using the Penn treebank and Perl readability measure and the LIWC tagger.

Results for 2013, 2018 and 3,000 word data

Machine learning performance

The most important measure for me is which is the best for making predictions.

Using all tags including punctuation	Accuracy	95% Confidence Interval	Sensitivity	Specificity
Readablity 2013	65.62%	57.7-72.9%	69%	63%
Readablity 2018	65.00%	57.5-72.8%	68%	63%
Readablity 3k	55.62%	47.6-63.5%	68%	44%
LIWC 2013	75.00%	67.6%-81.5%	76%	74%
LIWC 2018	71.70%	64.0-78.6%	78%	66%
LIWC 3k	56.25%	48.2-64.0%	53%	60%

According to this the LIWC is still the best tagger and that both 2013 and 2018 data are fairly similar for both readability and LIWC, with the results being in each other’s 95% confidence interval.

Both for readability and LIWC the first 3,000 words (3k) are much worse predictors of overall success and barely better than a 50/50 guess.

Difference in significance in key measures

Punctuation

Overall there was not much difference in omitting punctuation for LIWC or Penn analyses. In fact the machine analysis performances all dropped around 5% points.

Readability

Genre	Significant 2013	Significant 2018	Significant 3k words
Adventure	TRUE	TRUE	TRUE
Detective/mystery	TRUE	TRUE	TRUE
Fiction	FALSE	FALSE	FALSE
Historical-fiction	FALSE	FALSE	FALSE
Love-story	TRUE	TRUE	TRUE
Poetry	FALSE	FALSE	FALSE
Sci-fi	FALSE	FALSE	FALSE
Short-stories	FALSE	FALSE	FALSE

Significant tags in the same genres for all 3 different categories.

LIWC categories

Test	genre	Significant 2013	Significant 2018	Significant 3k words
Clout	Adventure	TRUE	FALSE	TRUE
	Detective-mystery	TRUE	TRUE	FALSE
	Fiction	TRUE	TRUE	FALSE
	Historical-fiction	FALSE	FALSE	FALSE
	Love-story	FALSE	FALSE	FALSE
	Poetry	FALSE	FALSE	FALSE
	Sci-fi	FALSE	FALSE	FALSE
	Short-stories	FALSE	FALSE	FALSE

Authenticity	Adventure	FALSE	FALSE	FALSE
	Detective-mystery	FALSE	FALSE	FALSE
	Fiction	TRUE	TRUE	FALSE
	Historical-fiction	FALSE	FALSE	TRUE
	Love-story	FALSE	FALSE	FALSE
	Poetry	TRUE	TRUE	FALSE
	Sci-fi	FALSE	FALSE	FALSE
	Short-stories	FALSE	FALSE	FALSE

Analytical	Adventure	FALSE	FALSE	FALSE
	Detective-mystery	FALSE	FALSE	FALSE
	Fiction	TRUE	TRUE	TRUE
	Historical-fiction	FALSE	FALSE	FALSE
	Love-story	FALSE	FALSE	TRUE
	Poetry	FALSE	FALSE	FALSE
	Sci-fi	FALSE	FALSE	FALSE
	Short-stories	FALSE	FALSE	FALSE

6 letter words	Adventure	TRUE	TRUE	TRUE
	Detective-mystery	FALSE	FALSE	FALSE
	Fiction	FALSE	FALSE	FALSE
	Historical-fiction	FALSE	FALSE	FALSE
	Love-story	TRUE	TRUE	TRUE
	Poetry	FALSE	FALSE	FALSE
	Sci-fi	FALSE	FALSE	FALSE
	Short-stories	FALSE	FALSE	FALSE

Dictionary words	Adventure	FALSE	FALSE	FALSE
	Detective-mystery	FALSE	TRUE	TRUE
	Fiction	TRUE	TRUE	FALSE
	Historical-fiction	FALSE	FALSE	TRUE
	Love-story	FALSE	FALSE	TRUE
	Poetry	FALSE	FALSE	FALSE
	Sci-fi	TRUE	TRUE	TRUE
	Short-stories	FALSE	FALSE	FALSE

Tone	Adventure	FALSE	FALSE	FALSE
	Detective-mystery	TRUE	TRUE	TRUE
	Fiction	TRUE	TRUE	TRUE
	Historical-fiction	FALSE	FALSE	FALSE
	Love-story	TRUE	TRUE	FALSE
	Poetry	TRUE	TRUE	TRUE
	Sci-fi	FALSE	FALSE	FALSE
	Short-stories	TRUE	TRUE	TRUE

Mean words per sentence	Adventure	TRUE	TRUE	TRUE
	Detective-mystery	FALSE	FALSE	FALSE
	Fiction	TRUE	TRUE	FALSE
	Historical-fiction	FALSE	FALSE	FALSE
	Love-story	FALSE	FALSE	FALSE
	Poetry	FALSE	FALSE	FALSE
	Sci-fi	FALSE	FALSE	FALSE
	Short-stories	FALSE	FALSE	TRUE

Whereas readability was consistent across the different approaches the LIWC categories shows a lot more variety.

Tone has the most success across this. As before the 2013 and 2018 data tend to match (but not always, as with Clout or Dictionary words) and 3,000 words, well, it does its own thing.

Tone most consistent throughout and as last time had most significant categories even with 3k.

Parts of speech tags (PoS) with the largest difference

The tables list the top 3 PoS that dominate in successful and unsuccessful books.

Penn data

Successful PoS 2013	Successful PoS 2018	Successful PoS 3k
INN – Preposition / Conjunction	INN – Preposition / Conjunction	INN – Preposition / Conjunction
DET – Determiner	DET – Determiner	DET – Determiner
NNS – Noun, plural	NNS – Noun, plural	NNS – Noun, plural

Unsuccessful PoS 2013	Unsuccessful PoS 2018	Unsuccessful PoS 3k
PRP – Determiner, possessive second	PRP – Determiner, possessive second	RB – Adverb
RB – Adverb	VB – Verb, infinitive	PRP – Determiner, possessive second
VB – Verb, infinitive	RB – Adverb	VB – Verb, infinitive

LIWC data

Successful PoS 2013	Successful PoS 2018	Successful PoS 3k
functional – Total function words	functional – Functional words	functional – Total function words
prep – Prepositions	prep – Prepositions	prep – Prepositions
article – Articles	space – Space	article – Articles

Unsuccessful PoS 2013	Unsuccessful PoS 2018	Unsuccessful PoS 3k
quote – Quotation marks	allpunc – All Punctuation*	adj – Common adjectives
allpunc – All Punctuation*	affect – Affective processes	adverb – Common Adverbs
affect – Affective processes	posemo – Positive emotion	affect – Affective processes

The same tags dominate all the books in the Penn treebank for successful books – prepositions (for, of, although, that), determiners (this, each, some) and plural nouns (women, books).

For unsuccessful books it also has determiners that dominate but in the possessive second person (mine yours), adverbs (often, not, very, here) and infinitive verbs (take, live).

For LIWC it is quite similar. Functional words dominate with (it, to, no, very ), prepositions also dominate successful books (to, with, above is its examples) and articles (a, an, the) and (it, to, no, very).

For unsuccessful books it’s all punctuation, quotation marks and social (mate, talk, they while including all family references) and affective processes (happy, cried), which includes all emotional terms.

Quotations suggest a high propensity to a high ratio of dialogue to action/description.

What does this tell us?

2013 v 2018 data

Overall there is more similarity than difference in the 2013 and 2018 Penn and readability results. The machine learning performance was also broadly the same, with each other’s overall performance falling within the 95% confidence interval.

The most successful PoS were also largely the same, as were the top 3 unsuccessful ones.

Likewise the LIWC categories generally matched in significance for both 2013 and 2018 data. The Successful PoS were broadly the same, as were the unsuccessful ones.

This suggests that while the original authors didn’t mention that the data was only from the previous 30 days, their results have largely stood to be true.

The first chapter

Just judging a book by its first 3,000 words was not as accurate as analysing the whole book. The machine learning performance was barely better than a guess.

However, the readability did match and the dominance of successful PoS was similar to that of the full data in the 2013 and 2018 studies.

Of all the LIWC categories described in part 3, Tone both was the most significant predictor across genres but also the most consistent across the different tests.

Summary

The 2018 results generally matches the 2013 results and as such suggest the original method holds as a good predictor of success or failure of those books.

The first 3,000 words results did not match the 2013 or 2018 data and as its machine learning performance was the weakest suggests that this is not an accurate way to predict a book’s success. It may be that there is a ‘sweet spot’ where the first x amount of words correlates closely with the overall rating, but it is more than 3,000 words.

Successful books tend to use prepositions, determiner and nouns and functional words. Unsuccessful ones skew towards quotations marks, punctuation and positive emotions (which with the LIWC are similar to affective processes).

This suggests that unsuccessful books may use shorter sentences (high punctuation rate), more dialogue (high quotation mark rate), adverbs and are more emotional, particularly positive emotions. Writers are frequently told by writing experts to avoid adverbs wherever possible.

Successful books by contrast tend to focus on the action – describing scenes and situations, hence the dominance of functional words, prepositions and articles. This makes them sound rather boring, but suggests that these bread and butter words are necessary to build a good story.

The LIWC data suggests that tone is the most reliable predictor of success. But what isn’t answered whether it is because it predominates in successful or unsuccessful books and whether it is positive or negative emotions. This is something to explore though based on the emotion and affect appearing in the top 3 of unsuccessful books suggests it is there.

Having punctuation tags had some use and machine learning performance was better with it so even though the punctuation tags can be hard to interpret, it is worth including them in any machine analysis but more work is needed to interpret them.