Comments on The UQ Psyc Blog - A Day in the Life of Psychology@UQ: It's only funny when it happens to someone else.

they are both intended at words of encouragement. ...

2011-10-19T16:28:49.140+10:00

they are both intended at words of encouragement. The first point is just that your adviser has a lot more experience than you, and has learned from it. S/he probably also didn't initially know about not using crap data, or how to "clean" data, until someone pointed it out to him. So don't be hard on yourself.

The second point is more about innovation. Young brains may think of some way of looking at data that hasn't been thought about before. Much of your post is about data analysis, so I was just pointing out that there's always room for innovation there.

Good post. Don't worry about making mistakes -...

2011-10-19T16:25:44.786+10:00

Good post. Don't worry about making mistakes - we all do it at some point. The main thing is to change your habits so that it can't happen again, e.g. statisticians encourage making a habit of ALWAYS eyeballing all of your data (descriptive stats, histograms, stem plots, scatter plots, co plots, etc) across all dimensions before running ANY statistical tests.

I think what Bart is saying is that basing exploratory data analyses on finding statistically significant differences betweens means or medians of samples relies upon making a LOT of assumptions about your data and the nature of the underlying process/relationship you're investigating (i.e. that it comes from a normal distribution).

Quite often, the shape of the probability distribution is actually more interesting than the actual value of the mean/median - i.e. is it a bell curve (normal distribution), exponential, uniform, or some other distribution? Is it a skewed variant of a typical distribution? Some distributions such as the Cauchy don't even have a concept of a mean - even though you can calculate a "mean" from data (it'd be wrong).

Here's a simple example - take the bimodal distribution. It's basically like a normal/bell curve, but with two peaks instead of one. You might get bimodal data if there are two ways some event/process can occur (e.g. a quick simple way versus long complex way). The interesting thing is that the definitions of mean, median, and standard deviation are all "wrong"! Tests that rely upon an assumption of normality (e.g. two sample t-test) are invalid for this type of data (even though you can easily throw it into a stats package without thinking and it'd happily calculate a p-value for you).

http://en.wikipedia.org/wiki/Bimodal_distribution

By the way, a quick nitpick... "eliminating outliers above and below certain thresholds" is probably not the best way of explaining what you're doing for data cleaning (I hope not, anyway). Outliers should never be summarily deleted since it changes the shape of the data (it's a bit like cherry picking). I'm sure what you're doing is fine - but it's worth being careful with semantics and terminology.

Not sure what your first point means; I made all t...

2011-10-18T19:32:35.886+10:00

Not sure what your first point means; I made all the errors I mentioned here, not him. He was the one who picked them up.

And I'm not sure what your second point is trying to say. Could you re-word it? I'll admit it's been a long day and my brain is halfway out my ears at present. . .

Nice post. Two thoughts. If after two decades yo...

2011-10-18T18:54:35.220+10:00

Nice post. Two thoughts. If after two decades your supervisor doesn't know about some basic issue of data "cleaning" (yes, we have to deal with completely unmotivated tools that only add noise), then you should get a new supervisor.

That's what I figured after being told to go do so.

Second, and more generally, your mentor may no go back to basics, and always consider different, unpublished was of looking at data, and thinking about the patterns in it. This is why it's fun to have motivated students, because the job of new blood is to have new ideas, or at least stimulate them. Most people plot means or medians and throw out all of the information about the shape (the higher order moments). That's a lot of lost information,and matters because chances are you're never actually looking at normally distributed data (gasp!). What you can do is actually plot all of your data, and look at the patterns it (does or doesn't) form. Then you think about how to account for the structure in your data. And what different analyses are appropriate,which depends on which aspects of the data you're trying to explain.

Thinking about data can be a lot more interesting than what is taught in most of the psych stat courses.