Intuition: False Discovery Rate (with animations)

14. January 2023

I am currently setting up a lecture on multiple comparison correction (for related posts see here or here). In a nutshell: If you apply a statistical test, that allows for 5% of false-positives (a ‘wrong’ significant finding), many many times, you are more or less guaranteed to find a significant effect (because p(at_least_one_positive) = $1 – (1-0.05)^{100} = 0.994 = 99.4$%)

False-Discovery-Rate (FDR) is one way, to try to adapt to this. In this post I will give you a visual intuition behind it, not walk you through the math.

Note that FDR has a different goal to e.g. Bonferroni correction: An FDR corrected set of p-values tries to give you e.g. 5% of false-positives, of all significant ones. Not 5% false-positives of all p-values. Thus in some sense it is a less stringent correction.

Small ex-course: p-value distributions

Imagine you simulate 1000 data-sets without any effect in them, and ran 1000 statistical tests. We take all the p-values and visualize them using a histogram, this will look like this:

each update of the gif is one instance of an simulation. Note how all p-values are equally likely. The colors have no meaning

You can see, that all p-values are uniformly distributed (which means, all p-values are equally probable) if no effect exists.

What happens now, if we introduce an effect? We would expect many more p-values smaller 0.05, but still some that are greater 0.05, just due to chance (and depending on your noise/power).

P-Values given a strong effect. Note the strong bias towards small p-values. This make sense because given we simulated an effect, it should be unlikely under the $H_0$

This is exactly what we see here.

Let’s get back to FDR

Next, what happens if we mix the two? I.e. we get some false and some true positives. This is exactly the situation we set out with! We want to control the number of false-positives where some might underly a true effect and others are member of the H0-Team. What we do in FDR is:

Estimate how many p-values could be attributed to the $H_0$ distribution. This is where the orange p-value set comes into play: Those arguably give us a $H_1$-uncontaminated calibration-set to estimate what the “height” of the uniform $H_0$ p-value distribution should be. In other words: we find out, how many p-values we expect to be smaller than our threshold ($\alpha$) if no effect exists.
Estimate how many p-values could be attributed to the $H_1$ distribution. These are the ones over and beyond what we would expect under the “pure” $H_1$
We allow for a certain amount (usually 5%) of $H_0$-pvalues relative to the $H_1$-pvalues. Because $H_1$ pvalues are expected on the smaller side, we can simply change the threshold until the ratio of these two fits.

The estimated False-Positives are the ratio of the estimated $H_1$-pvalues divided by the estimated $H_0$-pvalues. The orange line is here estimated by p-values>0.5. Note how we need to reduce our $\alpha$-threshold from typical 0.05 to until ~0.003 to reach 5% false-positives in our set of significant p-values.

This is not how FDR is calculated in practice. For this, have a look at this blog-post from Matthew Brett. But at least it allowed me to visualize it in a way I could better intuit.

Small ex-course: p-value distributions

Let’s get back to FDR

Leave a Reply