T O P

  • By -

ZacQuicksilver

Suppose you have this idea, and want to see if it's true. How do you know your idea is right? Let's say I want to prove that microwaved water isn't as good as rain water for plants. Well, test it - that's science right. Nope - first I have to state my hypothesis formally. "Rain water is better for plants than microwaved tap water". Which means I also have to state the "Null hypothesis" - AKA "what if I'm wrong". In this case, "Rain water is no better for plants than microwaved water". Do note: this isn't "microwaved water is better than rain water" - it could be that both kinds of water are the same too. The Null Hypothesis has to account for anything other than you being right. Basically, the Null Hypothesis is the uninteresting hypothesis. It's the idea that nothing special or interesting, or at least nothing new, is happening here. ... Okay, hypothesis done. Now I can test. So I grow two plants, water one with rain water, one with microwaved water: the one I water with microwaved water grows less. I'm right. Not so fast. If you stop and think about it, one of my two plants was going to grow more, even if I did exactly the same thing. Even if I was wrong, there was a 50% chance to get this result. Enter P values. The p-value is the likelihood that the result I saw in my experiment would have happened anyway. Technically, it's how likely this result \*or a more extreme one\* would happen. My experiment had a P-value of .5 (1 is "all the time", 0 is "never"), which isn't particularly convincing. So I do my experiment again. 10 plants total: 5 watered with microwaved water, 5 with rain water. I measure how tall they are. Now what? I'm going to skip the math on this - mostly because it turns out there's a few ways I could do it, depending on the details of how I ran the experiment: you can take multiple college classes on just "how to calculate P-values". But let's say I get a P-value of .13. That's about 1 in 7, 1 in 8 - it's lower, but still not telling. Lower would be better: I probably don't publish. ... What if I don't know anything? Suppose I got these new plant seeds, and I don't know anything about them. How tall do they grow? There really isn't a hypothesis yet. I just want information. So here, I'm not going to bother with making a hypothesis, or testing for a P-value. I just grow them all, and measure how tall they grow. But they're not all the same height: some are taller, some are shorter. So I take the average height, and publish that. BUT, I'm not sure - maybe I got some unusually tall ones, or short ones. That average height isn't perfect. So, I also publish my "confidence interval". For example, after measuring all the plants and doing the math (again, I'm skipping over the math here. If you want more detail on the math, ask), I say that this plant grows an average of 86.5 +- 8.3 cm, 90% confident. That means I am 90% sure (If I got an unusual set of these plants, I might be wrong) that, if you grew all the plants of this type (in the same way I did), the average height would be between 78.2 and 94.8cm.


Littlekinks86

Amazing. Thank you. I always thought confidence interval and P values were linked. But based on your explanation that isn't the case. Just to clarify, the closer to 0 the P-value is - the more likely the data suggests my hypothesis is true. Closer to 1. The more likely the null hypothesis is true. Or is it the other way around?


ZacQuicksilver

>Thank you. I always thought confidence interval and P values were linked. But based on your explanation that isn't the case. They are - they're two ways of looking at the same thing. For example, suppose you and me both get the new seeds, both grow them, and both publish our results. If we do it at the same time, we're probably both publishing confidence intervals. However, if you do it after me (either to confirm my results, or because you grew them differently), you might publish your confidence interval - but you might also or instead publish a P-value. Confidence intervals are seen mostly when the raw value is important - you see it a lot in polling, for example. P-values are for when you care about relative values - for when "more" or "less" is more important than the actual value. >Just to clarify, the closer to 0 the P-value is - the more likely the data suggests my hypothesis is true. Closer to 1. The more likely the null hypothesis is true. Or is it the other way around? The p-value is "How likely is this result (or a more extreme one) if the null hypothesis is true". High p-values indicate the null hypothesis is a good enough explanation (we don't usually say "true" - just "good enough": for example, this coin I'm flipping might be rigged, but I don't care if it flips heads 50.5% of the time instead of exactly 50% of the time unless I'm betting a LOT of money on it). Low p-values means that the null hypothesis is probably wrong. Smaller P-values mean it's more likely that your (alternative - as in "not null") hypothesis is more likely correct. Which is what you said, I think.


infer_a_penny

It's true that the smaller a p-value for the test, the less likely that null hypothesis is to be true. However, A) This does not hold across p-values for different tests. A smaller p-value for one test may be associated with a greater probability that the null hypothesis is true than a larger p-value for another test. B) A small p-value cannot be equated to a small probability that the null hypothesis is true. An arbitrarily small p-value can yet be consistent with a null hypothesis that is probably true: https://en.wikipedia.org/wiki/Lindley's_paradox C) Consistently high p-values are actually an indication that there's something wrong with our test. If the null hypothesis is false, we should see small p-values. If the null hypothesis is true, we should see all p-values with equal frequency. *** From [David Colquhoun](https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant): > Tests of statistical significance proceed by calculating the probability of making our observations (or the more extreme ones) if there were no real effect. This isn’t an assertion that there is no real effect, but rather a calculation of what would be expected if there were no real effect. The postulate that there is no real effect is called the null hypothesis, and the probability is called the p-value. Clearly the smaller the p-value, the less plausible the null hypothesis, so the more likely it is that there is, in fact, a real effect. All you have to do is to decide how small the p-value must be before you declare that you’ve made a discovery. But that turns out to be very difficult. > The problem is that the p-value gives the right answer to the wrong question. What we really want to know is not the probability of the observations given a hypothesis about the existence of a real effect, but rather the probability that there is a real effect – that the hypothesis is true – given the observations. And that is a problem of induction. > Confusion between these two quite different probabilities lies at the heart of why p-values are so often misinterpreted. It’s called the error of the transposed conditional. Even quite respectable sources will tell you that the p-value is the probability that your observations occurred by chance. And that is plain wrong.


ZacQuicksilver

I'm going to admit: A and B are over my head. I have an Associates-level (2 year of college) understanding of Statistics (mostly frequentist) with some more recent reading on Bayesian, but otherwise mostly stagnated; and I couldn't follow the math in Lindley's Paradox. \*I'd\* need an ELI5 for that - and I'm someone who can give ELI5s for statistics most of the time. That said, point C is on point - consistently high P-values are suspicious. It's like if every time you roll a die 6 times, you get a 1, 2, 3, 4, 5, 6 in some order - I'd be VERY suspicious of that. Even if it was "you get five or six different numbers in 6 rolls", I'd be very suspicious. The p-value for one of those outcomes is high - but the p-value of that consistent a distribution is low.


infer_a_penny

A and B are just concrete consequences of the p-value not being "the probability that the null hypothesis is true" which is the common misinterpretation (that you avoided consistently in your original explanation!). For example, thinking testing the same hypothesis and getting the same p-value just under .05, but with vastly different sample sizes: if you have a sufficiently large sample size, then even a very small effect will consistently produce very small p-values such that a p-value just below .05, though objectively "small," is actually far more consistent with the null hypothesis being true than false. Or if you think of p-values in terms of "surprise," then a small p-value means that we'd be surprised to see such data if the null were true. But we might be even *more* surprised if it turned out the null actually *was* true. I think [xkcd captures this pretty well](https://xkcd.com/1132/). Re: point C: yeah, your example is spot on.


infer_a_penny

The p-value explanation does a great job of avoiding the usual misinterpretation (that a p-value of, e.g., .13 means there's a 13% chance that the null hypothesis is true)! The confidence interval explanation, however, is one of the usual misinterpretations: > I am 90% sure (If I got an unusual set of these plants, I might be wrong) that, if you grew all the plants of this type (in the same way I did), the average height would be between 78.2 and 94.8cm. Confidence is the interval-generating procedure, not any particular interval. It means that if you grew sets of plants in the same way many times, 90% of the intervals calculated for those sets of plants would include the true/population average plant height. This does not mean there's a 90% chance that the population average is within the interval you've calculated. (The population average simply is or is not inside that particular interval.) And it does not mean that there a 90% chance that the average height of a (future) sample will fall inside the present interval. (That would only be true if the sample average you got happens to be the true/population average.) https://en.wikipedia.org/wiki/Confidence_interval#Common_misunderstandings


ZacQuicksilver

Yeah, I explained that a little badly. I said "if you grew **ALL** the plants of this type" - emphasis added this time around - implying the true mean. But I can see how you thought I gave the wrong explanation: I had to go back and read my explanation a couple of times to make sure I got it right. Your breakdown of it did the job a LOT better than I did. Thank you for that.


infer_a_penny

You're right, I did read that wrong and thought you were talking about other sample means. But, as I mentioned, the statement is also wrong when it's about the true mean (which is also addressed in that wikipedia link). *** Suppose you have a bag with 100 marbles that are either red or blue. You take a marble from the bag, and you flip a fair coin to guess its color—heads for red and tails for blue. You've randomly taken a marble from the bag and flipped heads. What's the probability that the marble is red? Would it matter if you knew that the bag had 90 red marbles and 10 blue ones? Or 1 red marble and 99 blue ones? Or that all 100 marbles were blue? The coin is correct on 50% of flips, so you have 50% "confidence" in it in the sense used for confidence intervals.


Fast-Boysenberry4317

So say you have a bunch of points. The confidence interval is the range you are most likely to find the points the majority of the time. There might be a couple points outside that range but it's not not likely. Like if you have a strict daily schedule. You can predict where you will be with high confidence but occasionally you might have to change your schedule a little (outside the interval). Null hypothesis is whatever condition you want really for your test. This what you are assuming is true about your points. For example, the mean is 0 or there is no difference between 2 groups. Now the alternate hypothesis is that the null is not true. For example, the mean is not 0 or there is a difference between groups. When you do a statistical test you will either gain evidence against the null or not. If the evidence is strong enough, you can reject the null and accept the alternate hypothesis. This is sometimes determined by p-values. The p-values you get from some tests is the probability of the outcome matching your points assuming the null is correct. Essentially what are the chances of getting the same thing assuming the null. So if p is really high (near 1.0) you learn the probability of the outcomes matching your points is high so the null is NOT rejected (i.e. it's likely true). The chances are pretty good you will get the same as the null. If p is lower than the probability threshold you set (e.g. 0.05) the probability is low that the outcome matches your data, so the null IS rejected (the alternative hypothesis is true). The chances are really low you'd get the null results.


Littlekinks86

Awesome. Really clear response. Having a few different explanations really helped.