Reviews of reviews, reviewed

I’m very pleased to have passed my data analysis and statistical inference course. It’s just a shame that it reveals that a previous post is wrong.

First, why I’ve written this. It’s about statistical inference. To give a very short, very simplified view, statistical inference is a way of making predictions about larger sets of data from a few samples.

Say we wanted to work out the probability of a drug testing giving a false positive, or the proportion of university educated men who think a woman’s place is solely in the home. You could test or ask everyone, or you could run it past a few and make a prediction – or infer – the wider picture.

That’s what I tried to do with a script analysis. Unfortunately in my enthusiasm for the course, I did something very wrong – I went with what seemed to work rather than what I could prove.

I wanted to analyse scripts to make an inference. Excel has statistical tools. It seemed like a simple case of I feed the data in and get an answer out. Except only one gave a good answer. But it was called ‘correlation test’, so surely it meant it would show how things matched, or ‘correlated’?

Not really. Now I’ve worked with R, a maths program better suited for these tests than Excel, and passed my course with distinction (I earned it, I’m going to brag) I know now what I should have done.

I used too small a sample set and tests were too arbitrary. I could get away with a small test, or even arbitrary, but I used too many poor techniques. So I won’t repeat the experiment.

But it’s a good start. More importantly it’s shown me what can be analysed and that’s worth it, I’ll be starting that analysis over Christmas.

And if you do want to learn for yourself I can only recommend the Duke University online course via Coursera. You can sign up yourself for the next data analysis and statistical inference course.