The Grand Locus / Life for statistical sciences

the Blog

## Does science need statistical tests?

Some time ago, my colleague John asked for help with the statistics for one of his manuscripts.

“We have this situation where we knocked out a gene with CRISPR and I want to test if it affects viability. I know that you are supposed to use a non-parametric test when the sample is small, but I have heard that you can still use the t test if the variables are Gaussian. So now I am genuinely confused. Which test should I use?”

“I agree. It’s confusing. Why do you want to make a statistical test by the way?”

“Same as everyone. I want to know if the effect is significant. Plus, I’m hundred percent sure that the reviewers will ask for it.”

“I see. I will rephrase my question then. What decision do you have to make?”

“I can give you all the details of our experiments if you want, but I’m surprised. Nobody has ever asked me that before and I thought that experimental details do not really matter so much for a statistical test. So what kind of details do you need?”

“Nothing in particular. I just want to know whether you...

## One or two tails?

Here is a discussion that I recently had with my colleague John. He approached me with the following request:

“I sent a manuscript to Nature and it is going quite well. Actually the reviewers are rather positive, but one of them asks us to justify better why we used a one-tailed t test to find the main result. What should I write in the methods section?”

“It depends. Why did you use a one-tailed t test?”

“Well, we first tried the standard t test, but it was borderline significant. My student realized that if we used the one-tailed t test, the result was significant so we settled for this variant. We specified this clearly in the text, and I am now surprised that I have to justify it. Isn’t it just an accepted variant of the t test?”

“To be honest, I understand your confusion. The guidelines are rather ill-defined. Actually, Nature journals make it worse by requesting this information for every test, even for those that are only one-tailed like the chi-square.”

“OK, but what should I do now? For instance, how do you justify using a one-tailed t...

## Did Mendel fake his results?

You went to high school and you learned genetics. You heard about a certain Gregor Mendel who crossed peas and came up with the idea that there is a dominant and a recessive allele. You did not particularly like the guy because there would always be a question about peas with recessive and dominant alleles at the exam. But you grew up, became wiser and just as you started to like him, you heard from someone that he faked his data. You felt disoriented for a while, why annoy you with this stuff at school if it is wrong? But then you came to the conclusion that he just got lucky and that he was right for the wrong reasons. After all, he was just a monk on gardening duties, why would you expect him to understand anything about real science?

### Gregor Mendel

Gregor Mendel was a monk, but he was also a trained scientist. He studied assiduously for twelve years (including about seven years on physics and mathematics), to then become a teacher of physics and natural sciences at the gymnasium of Brno. He prepared his most famous experiment for two years, meticulously checking and choosing his...

## (Mis)using the KS test for p-hacking

Update: I have published a more academic version of this story in GigaScience, under the title The signed Kolmogorov-Smirnov test: why it should not be used. The reviewers (Garrett Jenkinson and Desmond Campbell) have pointed out that the t-test is more appropriate than the Wilcoxon-Mann-Whitney test as a replacement of the signed Kolmogorov-Smirnov test. They also mentioned that the signed Kolmogorov-Smirnov is short on power, which is yet another reason not to use it. A great thing about GigaScience is that reviews are open, so you can access the discussion in the pre-publication history of the article.

A colleague of mine (let’s call him John) recently put me in a difficult situation. John is a very good immunologist who, as nearly everybody in the field, had to embrace the “omics” revolution. Spirited and curious, he has taken the time to look more closely into statistics and he now has an understanding of most popular parametric and nonparametric tests. One day, he came to me with the following situation.

“I have this gene expression data, you see... I know that a gene is up-regulated, but it is just not significant...