## Does science need statistical tests?

By Guillaume Filion, filed under
statistics,
p-hacking.

• 14 October 2022 •

Some time ago, my colleague John asked for help with the statistics for one of his manuscripts.

“We have this situation where we knocked out a gene with CRISPR and I want to test if it affects viability. I know that you are supposed to use a non-parametric test when the sample is small, but I have heard that you can still use the *t* test if the variables are Gaussian. So now I am genuinely confused. Which test should I use?”

“I agree. It’s confusing. Why do you want to make a statistical test by the way?”

“Same as everyone. I want to know if the effect is significant. Plus, I’m hundred percent sure that the reviewers will ask for it.”

“I see. I will rephrase my question then. What decision do you have to make?”

“I can give you all the details of our experiments if you want, but I’m surprised. Nobody has ever asked me that before and I thought that experimental details do not really matter so much for a statistical test. So what kind of details do you need?”

“Nothing in particular. I just want to know whether you have to make a decision at all. It is a genuine question. A businessman or a gambler have to decide where to put their money. MDs have to decide how to treat their patients. When they make a decision they cannot undo it. But you are a scientist, so what is it that you will not be able to undo?”

“Wait! Statistical tests have been the standard in the field for decades now. We use them to decide whether an effect is significant. Everybody does this all the time.”

“I know. And would you say that science is stuck forever with the result of a test after it is published?”

“Of course not. Thanks God! Mistakes happen and the beauty of academic research is that we can correct them. The authors or other researchers publish conflicting results and the consensus rules.”

“That’s exactly my thought. Now, how is this different from estimation?”

“What do you mean?”

“If scientific research is about building a knowledge that is forever refined, how does it differ from the process of estimation?”

“I suppose it’s very similar... Yes. But I don’t see where you are going.”

“I just mean that what you are searching for is an estimator, not a test.”

“...”

“You are interested in the viability upon CRISPR knock out. Why not measure it and give some confidence bounds?”

“How will I be able to ascertain that the viability is not affected then?”

“That’s not what a test would allow you to do anyway. But isn’it what the difference with the control would tell you?”

“I mean, how would I know that the difference is not *statistically significant*?”

“I don’t think it is possible to answer this question. Either your estimate of the variability is clearly close to the control, or it is not. You are the expert so you must know what is close enough.”

“This seems completely *ad hoc*. Are you sure it is rigorous?”

“Do you mean conventional or rigorous?”

### Science is an estimation process

I am old enough to remember a time when statistical tables were still widely used in research. Since then, statistical software became available, allowing the routine computation of p-values. This was an improvement as we were no longer stuck with significance levels of 5%, 1% and 0.1%. But p-values have since gone out of control.

The American Statistical Association had to make a statement in 2016 about the meaning of p-values (and more importantly about what they do not mean), recognizing that they are often misused. Brilliant people have expressed their opinions in this post on Cross Validated for instance, but there is no better solution at the moment.

It occurred to me recently that I have a bigger issue with tests than with p-values. A statistical test is a protocol that guarantees that you make the wrong decision 5% of the time when your null hypothesis is true. How does that matter if you can always change your mind when you get more data?

Science is much more an estimation process than a decision process. This is the way physicists seem to understand it for instance, where the emphasis is on measures and their uncertainties. So why not leave tests to practitioners and use estimation methods more often in scientific research?

Computers are here to stay. If they allowed the spread of p-values, we may as well use them to spread estimation methods.

blog comments powered by Disqus