Statistics: Cluster Permutation Test
I sometimes give improvised lectures to a small circle of students. These are some of the explanations I use.
The goal
Find time-points in a time series that differ between two conditions over subjects without performing single independent tests for each time-point.
Step by Step Guide
In this tutorial, we apply the cluster-permutation test to a single time-series. This could be a single channel, an average over channels, or one “source/component” from an ICA decomposition.
We can make use of the neighbour-relation of time: $t$ has the neighbors $t-1$ and $t+1$. We don’t expect the time points to be independent, but that they have a relationship which we can exploit (for multi-channel data, we could include the neighbour-relation between channels as well, but let’s not get ahead of ourselves).
In our example we use ICA component activation because a) this was the student’s problem, b) we do not need to take into account several channels which were recorded at the same time (but we could!). Of course the algorithm is independent of the signal: We could use ERPs, Bandpass-Filtered-Frequency Amplitudes, Image-Pixel-Grayvalues or many others.
1. Extract the component activation over time
We take the EEG data (time x sensor) and multiply it with a mixing matrix (the component map we extracted through ICA). We receive a one dimensional IC activation profile over time. In our case, we have trials of two conditions (A and B) shown as a continuous signal (only one trial of each condition is shown here, usually you have more than two trials).
2. Get the trials for multiple subjects
Repeat Step 1. for each subject. We calculate the difference between the two conditions A and B (a within subject comparison). Thus, we get difference values for each subject over time (purple).
3. Test-Statistics (T-Value)
As a test statistics, we could use the mean, but we prefer the t-statistic because it punishes high variance between subjects. t-values are defined by:
$$t = \frac{\bar{x}}{\frac{\sigma}{sqrt(n)}}$$
Where $t$ is the t-value, $\bar{x}$ is the mean difference, $\sigma$ the standard deviation and $n$ the number of subjects. Intuitively, the more subjects we have or the bigger the difference (marked in blue), the bigger the t-value. The smaller the variance, the bigger the t-value (marked in green). Very colloquial: a scale-free measure of how sure we are there is an effect.
A visual help is in the next figure.
4. Clusters over time
We would rather not do a statistical test for each time-point individually because we would need to correct for multiple comparison for all time points. Instead, we use a trick: Let’s define clusters by an arbitrary threshold and test whether these clusters are larger clusters that occur by chance.
The arbitrary threshold we use is at p=0.05 (which directly corresponds to two t-values when given the number of subjects, e.g. for 15 subjects the t-values corresponding to p=0.05 are 2.14 and -2.14). In our example, this leads to a single positive cluster, but of course, multiple cluster of different sizes could have formed depending on the IC activation over time.
As a statistic, we could use the number of samples the cluster extends, the summed t-value, or many other statistics. We use cluster-mass, which is the sum of the t-values.
5. Permutation of data
We now want to estimate how big clusters would be, if there would be no differences between the conditions A and B. This would mean that the clusters formed just by chance (this is our $H_0$ distribution of cluster sizes). To do this, we shuffle the condition-label for each subject. The idea is that if there is no difference between the conditions, the labels are meaningless and therefore shuffling them would yield similar results as before. Note that they are similar, not identical. We thus try to estimate how big the variability of these similar results are, and whether our observed value falls into the variability, or whether i is special.
Note that we actually do not need to go back to the two conditions, but we could just flip (multiply by -1) randomly every subject-difference curve.
So we shuffle and calculate the cluster size.
And shuffle and calculate the cluster size(s) again (take the largest one if there are multiple)
6. Check the tail
We now check whether our observed cluster mass ( Step 4.) is greater than 95% of what we would expect by chance ( Step 5. ). The exact value gives us the p-value of the cluster, the probability that cluster-mass with the observed (or more extreme) size would have occurred when there was no actually difference between the conditions. If we had initially observed multiple clusters, we can check each against the same distribution.
We have two exemplary distributions here: in the left one, we would accept the notion, that the observed cluster mass could have appeared by chance ( p > 0.05). In the second case, we would reject $H_0$, the observed cluster mass is unlikely to come from random chance alone.
And that’s how you do cluster permutation statistics.
For some references, see:
Fieldtrip Cluster Permutation Talk
Fieldtrip FAQ, read this!
Maris: Statistical testing in electrophysiological studies
Maris, Oostenveld: Nonparametric statistical testing of EEG- and MEG-data.
Thanks to Tim C Kietzmann & José Ossandon for teaching me this some (long ;)) time ago. Thanks to Silja Timm for preparing the graphics and Anna Lisa Gert for comments on the text!