Mini Project 5: Advantages and Drawbacks of Using p-values

I think that the authors mean it will be more important to effectively convey and understand why they are getting the results they are as the “statistical significance” rule for drawing conclusions is used less. All the research and experimentation won’t be summed up in a single value. Given their hypothesis and thoughts on what is happening, a p-value might be used as more of a measurement of what happened and they can explain why, but not as an explanation for a conclusion. In my mind, statistical thinking pertains more to using the data to try and explain what happened/explain a conclusion. That’s a very vague statement, but I mean instead of saying the p-value is large so the results are not statistically significant, we can look at this value and the actual results/data and make a judgement that acknowledges the uncertainty and the context of the data. Not just dismissing what you find because the p-value says it’s not significant.
They talk a lot about treating p-values as continuous values. Although the whole point of the editorial is that p-values shouldn’t be a judge of “significance,” it is still a quantitative metric that can help give a sense of where the null hypothesis stands relative to the data. By dichotomizing the p-value with significance and insignificance, we are essentially ignoring what the data is actually saying and narrowing it all down to this single boundary. There is such a hard cut-off for p-values between what’s deemed statistically significant and insignificant, it leaves no wiggle room, and fractions of decimal points can lead to complete opposite conclusions. The dichotomization also can lead to ignoring results that are still meaningful. If the p-value misses a cut-off by 0.001, it is deemed insignificant and not in support of the research question, but in reality there is only a marginal difference between a p-value 0.001 below the threshold that was deemed significant.
I agree. In the case of “insignificance” where the p-value misses the arbitrary threshold, it doesn’t mean that the results of the project were not meaningful or important. Also, for all we know there is “significance” and it’s just this p-value saying there isn’t. This is likely over simplifying whatever actually happened. When deciding what results to present or highlight in publishing, I think it should be based more on how the results are explained and interpreted. Even in the case where the p-value supports the hypothesis, an explanation for what you found should still be required to attribute it to your hypothesis, not just because the number said so. When only “significant” results are published, other people can’t learn from whatever was found. Results that (allegedly) don’t support your hypothesis are still results.
I agree. Research and data have different contexts, so they can’t all be looked at in the same way. Different areas of research (ex. pharmaceuticals, psychology, economics) may have different boundaries for what should be considered data “in support” of your hypothesis, and even within these disciplines themselves there will surely be variation. Different disciplines ask different research questions and deal with different data, so it makes sense that they should be dealt with in different ways. To create a one-size-fits-all approach would likely cause some sort of loss of information or too broad of a generalization to be the best system.
I think statistical thoughtfulness means thinking carefully about what results mean, analyzing the data in context and with purpose, not just blindly using a p-value to determine your final conclusion. Even before looking at results, it can be acknowledging limitations or variation in the research and using that in interpreting the results. Statistical thoughtfulness can be demonstrated in an analysis by being deliberate in your choices, such as how you design the research/experiment, pre-determining the type of results you’re looking for (i.e. what supports your hypothesis), and how you communicate the results. For example, relating back to the p-value, explaining your level of confidence in your results instead of just talking about how the p-value is small therefore we were right, shows that you understand the results and can interpret them in the context in a way that makes sense and is transparent.
I think they believe the problem is that they are somewhat deterministic words. For example, saying we’re x% “confident” that the true value is in this interval may be misleading because the interval was created using a specific method on a specific set of data. Using words like significance and confidence seems like we’re fairly certain that this is actually what is happening on the general level and is often without context. Instead of drawing hard conclusions (ex. is significant/insignificant), you can say that based on the data and the model, the interval/p-value/… is compatible with having x quantitative effect. Using the word compatible encourages more of that statistical thinking over making dichotomous conclusions with a p-value. In general, I think that a terminology change would not be an immediate fix, but may steer researchers towards more thought-out and contextualized conclusions.
Section 5, paragraph 2: Why is eliminating the use of p-values as a truth arbiter so hard? “The basic explanation is neither philosophical nor scientific, but sociologic; everyone uses them,” says Goodman (2019). “It’s the same reason we can use money. When everyone believes in something’s value, we can use it for real things; money for food, and p-values for knowledge claims, publication, funding, and promotion. It doesn’t matter if the p-value doesn’t mean what people think it means; it becomes valuable because of what it buys.”

This section stood out to me because of the comparison to money. If you think about it, money is just a made-up concept that we use to trade for things. Objectively, money has no value. It doesn’t do anything on its own, but you can use it to do things you want because of its alleged value. It’s become this huge construct that basically runs the world because we follow the system. Way back before currency was a thing, people just traded directly for things, and then one day there is this coin that’s being used instead. I’m sure it didn’t go smoothly the first time the first person tried to get something of value using some random coins since a metal disk doesn’t do anything. But wait, they can use those coins to get something else. How? Because we’re saying they’re valuable. The p-value has become accepted as this benchmark for whether your research “proved” what it was trying to prove because we’ve given it that value. Like this whole editorial says, a p-value doesn’t technically actually mean some hypothesis has been “proved” or anything, but since we’ve given it that power, it is valuable. And I can’t see society stop using money all of a sudden, so if we want to take power away from the p-value, it will probably have to be a gradual change.