New lab in Stuttgart & two 100% PhD positions

I will be starting a new lab on Computational Cognitive Science, next month at University of Stuttgart. I am looking for two PhD students working on the connection of EEG and Eye-Tracking. and methods development.

SimTech Building, three brains (not to scale) are arranged in the ponzo illusion
SimTech Building in Stuttgart

What can I offer?

  • A 100% TVL-13 position (calculate your monthly income here, increases after one year and after three years) for 3.5 years
  • A small lab with direct supervision
  • An “extended” lab with Andreas Bulling (e.g. we hold group meetings together)
  • High scientific rigor
  • Family-friendly and all-welcoming lab athmosphere
  • Travel money to at least 1 large conference and one summer school/year (depends on actual costs and my actual budget available, but this is what I plan/aim for)
  • Childcare budget for conferences
  • Flexibility for own projects (a plan exists, but we can deviate based on your own ideas)
  • Flexibility for a summerstay/internship in another lab (in Germany or abroad)
  • After 2 years: An honest discussion whether and how to prepare for academia or industry in the remaining time
  • Contact to students I formerly supervised
  • Some cool equipment relating to VR, 3D printing, motion tracking etc.
  • Flexible starting date, September at the earliest (please indicate in your cover letter when you would like to start)

What should you offer?

This is a list of what you *should* offer, but there is no need to offer all of those skills – there is lot of time to learn them during your PhD 🙂

  • A genuine interest in scientific advances
  • Scientific integrity, confidence to question authority (as “science is the belief in the ignorance of experts”), and the ability to stand up for good science
  • Basic coding skills (Julia/Python/R/Matlab preferred over C/C++/Java etc.) – we will improve those in your PhD!
  • Best case: Timeseries data-analysis experience (EEG, Eye-Tracking preferred over SCR, fMRI, fNIRS etc.)
  • Best case: Statistic skills (knowledge of multiple regression & mixed models is a plus!)
  • Background in neuroscience, cognitive science, statistics or related field
  • You should be prepared to do some teaching (~1 course/semester) and potentially some BSc/MSc supervision
  • Excellent command of English in writing and speach. German is not required, but a plus if working with participants
  • Being kind and open

What is the procedure?

  • You apply with
    • CV
    • Transcript
    • Master’s thesis (when the thesis is not completed yet, please send me another relevant and recent report/paper/essay and tell me when you will finish your thesis)
    • Link to code example (e.g. github link to an analysis you have done, a program you wrote, a pull-request you made)
    • Short cover letter (max. 1 page)
  • Please send PDFs (e.g. lastname_cv.pdf, lastname_transcript.pdf) in a single email to Please indicate starting date and which position (“Physiology”, “Analysis” or both) you’d like to apply for
  • If I see you as a potential candidate, I will schedule a jitsi online interview
  • I might require a small coding challenge (depending on how illuminating the provided code examples are)
  • After some time, I will decide for a shortlist and invite you back for a presentation on your thesis-topic
  • If all goes well, we will start a journey together 🙂

Position “Physiology”

We will be looking into modelling of EEG activity from eye-movement related parameters (extending on Additionally we will combine model-based analyses with EEG data. If you have interests and experience with Eye-Tracking, EEG, data collection, multiple regression analysis, then these skills are certainly helpful for this position.

Position “Analysis”

We will push the multiple regression framework, which is gaining much traction lately (“unfold toolbox”, to the next level. If you have interests and experience in EEG data analysis, multiple regression, GAMs, statistics, mixed models, then these skills are certainly helpful for this position

The physiology position will focus on the application of methods, while the analysis position will focus on the development of methods.

The projects will be in collaboration with other senior researchers e.g. Andreas Bulling (Stuttgart) or Olaf Dimigen (Berlin).

EEG recording chamber

I recently asked on twitter whether people can recommend recording chambers to seat the subject in psychological experiments. I had a tough time googling it, terms that could be helpful in case you are in search for the same thing: Testing chamber, subject booth, audiology.

I got a lot of answers and for the sake of “google-ability” will summarize them here:

  • Steve Luck recommends a separated chamber, but highlights importance of air-conditioning due to sweating artefacts
  • Aina Puce recommends no chamber, but to sit 2-3m behind the subject and use white noise generators

Regarding actual chambers several commercial vendors were thrown in the ring:

  • Studiobricks*
  • Whisper Rooms*
  • Desone
  • Eckel
  • IAC Acoustics

* no Farraday cage directly available as far as I know. But check this tweet for a custom solution

I haven’t asked all vendors for a price estimate, but as far as I can tell, with climate control & lighting a ~4m² room costs around 8.000€ – 12.000€ without a Farraday Cage. With a cage I would guesstimate +10.00-15.000€ but I actually don’t really know.

Comparing Manual and Atlas-based retinotopies; my journey through fmri-surface-land

PS: For this project I moved from EEG to fMRI, and in this post I will sometimes explain terms that might be very basic to fMRI people, but maybe not for EEG people.

I want to investigate cortical area V1. But I don’t want to spend time on retinotopy during my recording session. Thus I looked a bit into automatic methods to estimate it from segmented (segment = split up in WhiteMatter/GrayMatter+extract 3D-surfaces from voxel-MRI and also inflate them) brains. I used the freesurfer/label/lh.V1 labels and the neurophythy/Benson et al tools . The manual retinotopy was performed by Sam Lawrence using MrVista. And here the trouble begins:

The manual retinotopy was available only as a volume (voxel-file, maybe due to my completly lacking mrVista skills. I should look into whether I can extract the mrVista mesh-files somehow), while the other outputs I have as freesurfer vertex values, ready to be plotted against the different surfaces freesurfer calculated (e.g. white matter, pial (gray matter), inflated). Thus I had to map the volume to surface. Sounds easy – something that is straight forward – or so I thought.

After a lot of trial&error and bugging colleagues at the Donders, I settled for the nipype call to mri_vol2surf from freesurfer. But it took me a long time to figure out what the options actually mean. This answer by Doug Greve was helpful (the answer is 12 years old, nobody added it to the help :() (see also this answer):

It should be in the help (reprinted below). Smaller delta is better
but takes longer. With big functional voxels, I would not agonize too
much over making delta real small as you'll just hit the same voxel
multiple times. .25 is probably sufficient.


   --projfrac-avg min max delta
   --projdist-avg min max delta

     Same idea as --projfrac and --projdist, but sample at each of the
     points    between min and max at a spacing of delta. The samples are then
     averaged    together. The idea here is to average along the normal.

The problem is that you have to map each vertex to a voxel. So in this approach you take the normal vector of the surface (e.g. from white matter surface), check where it hits the gray matter, sample ‘delta’ steps between WM (min) and GM (max), and check which voxels are closest to these steps. The average value of the voxels is then assigned to this vertex.

I will first show a ‘successful subject before I dive into some troubles along the way.

red=freesurfer label, orange = benson label restricted to <10deg visual angle, purple = manual based on 10deg retinotopy data

Overall a good match I would say, generally benson & freesurfer have a good alignment (reasonable), the manual retinotopy is larger in most subjects. This might also be due to the projection method (see below)

Initially I tried projection withour smoothing, see the results below. I then changed to a smooth of 5mm kernel with subsequent thresholding (for sure there is probably a smarter way).

Without smoothing
With 5mm smoothing (red=freesurfer label, orange = benson label, purple = manual)

It is pretty clear that in this example the fit of manual with automatic tools is not very good. My trouble is now that I don’t know if this is because of actual difference or because of the projection.

Next steps would be to double check everything in voxel land, i.e. project the surface-labels back to voxels and investigate the voxel-by-voxel ROIs.

EEG/ERP rounding event latencies

22.10.2019 Edit: Thanks Matt Craddock, I understand the source of the problem better. He mentioned that this should not occur if the amplifiers record the triggers as trigger channels (before converting it to events). And mentions that this could happen through downsampling. Indeed after checking in the dataset I used it was downsampled from 1024 to 512Hz. This made many eventlatencies ~ X.50001, which will be uprounded with round and floored with floor. This gives some context to the problem.
Full discussion on twitter

TLDR; EEGlab allows for non-integer event latencies (in units of samples). Eeglab chose floor(latency), while others e.g. unfold & fieldtrip choose round(latency) to round the latency to samples. This leads to differences between toolboxes, in my example of up to 1.5µV (or ~25% ERP magnitude). Importantly, this probably does not introduce bias between conditions

This is an ERP, actually its two ERPs. One is calculated using the “unfold” toolbox and one using eeglab’s pop_epoch function

Elec: PO8, average reference, 1280 trials, 512Hz, very typical experiment with single stimulus presentation, no task

If you look very closely, you can see that they are not identical, even though they should be. So – whats the difference?

It turns out that EEGlab saves event latencies in samples (e.g. stimulus is starting at sample 213), but also allows non-integer latencies (e.g. stimulus is starting between 212 and 213, to be exact: at sample 212.7). This makes sense, i.e. if your EEG sampling resolution is 100Hz you might know your stimulus onset with higher precision and not in 10ms bins. But in order to get ERPs we have to “cut” the signal at the event onset. EEGlab uses “floor” for this and rounds the stimulus onset from 212.7 to sample 212. Other toolboxes (unfold / fieldtrip) use “round”, thus the event would start at sample 213.

It turns out that in the example you see above, this introduces a difference between the two ERPs of 0.5µV (!) thats around 8% of the magnitude.

difference “floor” vs “round” for 512Hz data

This is just a random example I stumbled upon. With lower sampling rates, this effect should increase. Indeed, downsampling to 128 Hz gives us a whopping difference of 1.5µV.

Difference “floor”-“round” for 128Hz sampling rate

Floor vs. round (vs. others?)

The benefit of floor, at least the one I can think of, is that it would never shift the onset of a stimulus in the future. That is, it is causal. Possibly there are other benefits I am not aware of.

The benefit of round is that it more accurately reflects the actual stimulus onset. Possibly there are other benefits I am not aware of

Given that we mostly use acausal filtering anyway, I think the causal benefit is not very strong.

There is yet another alternative: a weighted average between samples. We could “split” the event onset to two samples, i.e. if we want the instantaneous stim-onset response, and stim onset is at sample 12.3, then sample 12 should be weighted 30% and sample 13 by 70%. I have to explore this idea a bit more, but I think its very easy to implement in unfold and test. But this for a new blogpost.

The big picture

In the fMRI community there are papers from time to time reporting that different analysis tools (or versions) lead to different results. I am not aware of any such paper in the EEG community (if you know one, let me know please!) but I think it would be nice if somebody would do such comparisons.

I currently do not forsee if such an event-latency-rounding difference could possibly introduce bias in condition differences. But I forsee that changing it will be difficult for the EEGlab developers, as “floor” has been around for a very long time in eeglab.


Note that I did not use simulation here (but could have, it should be straight) but I cannot publicly share the data at this point in time.

%EEG = pop_resample(EEG,128);
timelimits = [-.3 .5];
EEG = uf_designmat(EEG,'eventtypes','stimulus','formula','y~1');
%deconv based "round" analysis
EEG = uf_timeexpandDesignmat(EEG,'timelimits',timelimits);
EEG = uf_glmfit(EEG,'channel',63);

%eeglab based "floor" analysis
EEGe = pop_epoch(EEG,{'stimulus'},timelimits);

ufresult = uf_condense(EEGe);
%% ERP plot
plot(ufresult.times,mean(,:,:),3)) %floor
hold on
plot(ufresult.times,ufresult.beta(63,:,1)) %round
xlabel('time [s]')
ylabel('ERP [µV]')
box off
export_fig erp.png -m3 -transparent
%% difference plot
title('Diff "floor"-"round"')
plot(ufresult.times,mean(,:,:),3)-ufresult.beta(63,:,1)) %floor-round
xlabel('time [s]')
ylabel('ERP [µV]')
box off
export_fig diff.png -m3 -transparent

EEG / ERP Lecture

I just gave my lecture at the Donders Toolkit. Many cool and interesting questions! I wish I had two hours, I had to skip some things I was excited about in favour of solid basics.

Find the slides here as pdf (5 mb) or pptx (23 mb).

In case you are interested in other EEG slides, here are slides on overlap correction (deconvolution) and non-linear modeling (pptx, 8mb), an introduction to linear models (pptx, 50mb), and slides on multiple comparison corrections (pptx, 5mb)

Can’t give a proper license unfortunately, as some slides are based on old Donder Toolkit Slides, handed down from the years. All other authors I stole slides should be acknowledge, hope I did not forget something. Do as you wish with the slides. May the copywrite gods see favourable on our educator souls

Electrode drift in EEG

I coudn’t find an image of electrode drift for my slides, so here I quickly generated one. The only fancy thing is the usage of datetime to have minutes on the x-axis (I also made this post so I don’t forget this trick ;))

DC-Amplifier (REFA-2, tmsi) plot of three electrodes over ~45min recording data. Strong DC-Offsets and drifts are observed (this is typical & normal)
After filtering at 0.1Hz the strong drifts and offsets are gone (as intended), the y-axis has a very different axis now!
Obviously a cutoff-freq of 0.05Hz in the filter will still allow for some slow moving drifts. These offsets are usually accomodated for by baseline corrections

Thanks to Anna Lisa Gert for this dataset

% Load Data
EEG = pop_loadeep_v4('subj23.cnt');
% Load filtered data (takes 35min to filter...)
EEG_filt = pop_loadset('2_subj23_lowpass_resample_deblank.set');
% Convert time to actual time
timesnew = datetime(EEG.times/1000,'ConvertFrom','epochtime','Epoch','2000-01-01');
% select random channels
chix = [5,63,27];

% Plot the unfiltered data
% Make the plot beautiful
datetick('x','MM','keeplimits','keepticks') % only show minutes
xlabel('time [min]') 
ylabel('voltage [µV]')
box off
title('EEG Electrode Drift (DC Amplifier, avg ref)')
set(gca,'fontsize', 14) % for a presentation
set(gca, 'FontName', 'HelveticaNeueLT Pro 45 Lt')
% Convert again because data have been resampled
timesnew = datetime(EEG_filt.times/1000,'ConvertFrom','epochtime','Epoch','2000-01-01');


xlabel('time [min]')
ylabel('voltage [µV]')
box off
title('EEG Electrode Drift (avg ref, 0.1Hz filter)')
set(gca,'fontsize', 14) % for a presentation
set(gca, 'FontName', 'HelveticaNeueLT Pro 45 Lt')

New Paper: How to compare and choose eye trackers

A (short) laypersons summary

participant performing in the experiment

TLDR; Using our data/paper you can find out how well your eye tracker is doing in measuring your favourite eye movement

This newly published paper is open access and I share first co-author ship with the amazing Katharina Groß! In addition this project was done together with Inga Ibs and Peter König: A new comprehensive eye-tracking test battery concurrently evaluating the Pupil Labs glasses and the EyeLink 1000

We move our eyes about four times per second, which is more often than our heart beats! Many studie show that these eye movements are a window into our minds (e.g. König et al. 2017) and are commonly used for basic research, marketing research and clinical assesments.

Even though eyetrackers are so commonly used and so powerful, they are rarely tested for how well they perform besides the manufacturers. Even more critical, no test battery existed before we set out on this project. Together with Katharina Groß (and Inga Ibs and Peter König) we developed a new eye tracker test battery which included most of the typically studied eye movements.

We chose to test two popular eye trackers concurrently: The Eyelink 1000 and the Pupil Labs glasses. The former is the working horse of eye movement research (release 2005), the latter the open hardware/source innovator (release 2014).

Our study reveals some strengths and some weakenesses for both eye trackers, but also of our newly proposed test battery. The results are numerours, so I will only depict two tasks:

Free Viewing

Here you can see where a participant is looking on an image (C & D) with the task to simply explore the image. You can clearly see that the eye trackers agree on the coarse scale (blue & red dots kind of overlay), but not in the specific details (e.g. disagreements that worse/better at some locations. Also the red (pupil labs) eye-tracker has much higher variability during eye movements .

Pupil Dilation

Our pupil shrinks when it gets bright and opens up when it gets dark. We tested how well the eye tracker can measure this by presenting differently bright images interleaved by black images (so the pupil can go back (or near) to baseline). We found that on the average, both eye tracker give the same results (both eyetrackers are depicted on top of each other, barely differentiable in figure C) – but calculating the difference between eye trackers for each subject they differ quite a bit (green lines in E are not at 0%)!

Take home

  • Eye tracking scientists need information about the reliability and performance of their equipment to make informed decisions
  • Every eye movement has their own parameters to check for, e.g. pupil dilation has different requirements to an eye tracker than microsaccade research
  • Eyelink did not reach the promised accuracy (we really tried, see comments in this link)

That concludes this short laypersons summary. If you have any question feel free to comment here, write me, Katharina Groß, or any of my other co-authors. (paper doi: )

Brain Art

Cerebrum Triptychon

I submitted this triptychon to the OHBM brain-art competition 2019. I used mouse myelin stains from the Brain Architecture Project and generated Rohrschach-Like creatures. I quite like how it turned out and I definitely want to do more in the future. I especially found the Allen Brain Atlas’ developmental mouse stains very cool – lots of potential there.

If you are interested in other art pieces related to neuroscience feel free to check out my thesis-art collection – one piece for each student I supervised.

Thanks to Ella Bosch & Anna Gert for latin & design advice!

Threshold Free Cluster Enhancement explained

The multiple comparison problem

In neuroimaging analysis one often is confronted with many electrodes/voxels and many timepoints, and often performs a statistical test on each of these electrode/timepoint combinations. This leads to a massive multiple comparison problem, as the probability to find a false positive is greatly enhanced. In the following example we assume independence of all data points . For instance with only 10 electrodes/voxels and 10 timepoints and an alpha of 0.05, the probability for a false positive is:

$$ p(significance^*|H_0) = 1-(1-0.05)^{10*10}=99.4 $$

*significance of at least one sample

But electrodes/voxels and timepoints usually are not independent. Contrary, data is rather smooth over electrodes, voxels and time. Therefore, by combining data points close in space (electrodes / voxels) and time using so-called cluster tests, one can try to partially circumvent this problem.

For the IICCSSS Summer School I made some slides explaining multiple comparison correction (link to slides, pptx), including Threshold Free Cluster Enhancement (TFCE). I explained cluster permutation tests in an earlier blog post, which you might want to read before this one.

Why TFCE and not cluster permutation tests?

In order to use cluster permutation tests, one typically first calculates some kind of statistics, in our example student t-values for each datapoint over subjects (or trials for single subject analyses). The one has to specify a cluster-threshold which defines the clusters. This threshold might miss broad but “weak” clusters, and focus only on “strong” but peaky clusters.

As an example I depict students t-value of an effect over time. In this figure, the blue cluster will be detected (the thresholded cluster mass is depicted by the light blue area), but the broader green cluster would be missed

Threshold Free Cluster Enhancement

The intuition of TFCE is that we are going to try out all possible thresholds and see whether a given time-point belongs to a significant cluster under any of our set of cluster-thresholds. Instead of using cluster mass, we will use a weighted average between the cluster extend (e, how broad is the cluster, i.e. how many connected samples) and the cluster height (h, how high is the cluster, i.e. how large is the t-value / the evidence for an effect) according to the formula:

$$ TFCE = \int_h e(h)^Eh^Hdh$$

for this blogpost, I will put the weights for the extend E and for the height H to 1 (therefore height ‘counts’ the same as width). The usual defaults are E=0.5 and H=2.

Note that the weights E and H of the TFCE formula $e^E$ and $h^H$ are set to 1 and therefore ommitted in the gif. A non-gif version can be found in this powerpoint presentation: (CC-By)

As you can see we use a discrete sum, approximating the integral from above. Another difference between TFCE and cluster permutations, is that you generate a TFCE value for each sample.

Signal before (t-values, shallow) and after (TFCE-values, peaky) TFCE enhancement

For instance a hypothetical t-value of 3 (red square in the above animation) is boosted by belonging to a cluster and might receive the TFCE value of 10. The resulting TFCE values can be thought of as a local scaling according to the “clusterdness” of a sample. Note that local minima and maxima stay at the same spot, this is different to a smoothing operation which could move the location of maxima and minima in time or space.

Because we calculated a TFCE value for each sample, we can also calculate a p-value for each sample. In order to get the p-values, we use the same trick wie used with cluster permutation test: the permutation part. We permute conditions (building the $H_0$), calculate the TFCE values for the permutated set, and take the max(TFCE) over all time points and electrodes/voxels. Our observed TFCE value then is either likely or unlikely given our empirical distribution of max TFCE values (under the $H_0$). But note that the interpretation is not that of a typical p-value at an electrodes/voxel!.

Interpretation of significant TFCE

I admit I made the following mistake before, it is a very convenient and easy mistake to make: As an example, let’s observe a significant sample e.g. at 100ms, using the TFCE procedure. This does not mean that the sample at 100ms shows a significant effect. It only means, that there exist at least one cluster-threshold (remember we tested all of them), where this sample belongs to a significant cluster. In other words, samples can be pushed to significance solely by being close to a “truely significant” cluster, without showing evidence by themselves to be significant.

I found this pretty confusing. But in practice it is important: Because we don’t know which samples make a cluster a significant one (all of them? half of them? only a single sample?) we cannot say much about the single sample, only about the cluster.

So, in practice what we do is that we look and report the p-values, but in addition make a descriptive statement on the cluster extend. For instance, you could argue that the t-values that you put in TFCE (or cluster permutation) are very much compatible with an effect from X ms to Y ms. Similar statements are also recommended on the fieldtrip site or in this recent paper by Jona Sassenhagen 2018.

Don’t write: “We found a significant cluster starting from 100ms to 200ms with a median effect of 5µV [3.5, 4.7µV].” or even “Conditions differed significantly from 100ms to 200ms (multiple comparison corrected)”.
Write: “We found a significant difference between conditions. The difference was driven by an effect from 100ms to 200ms with a median effect of 5µV [3.5, 4.7µV] .” or “We found a significant cluster, most compatible with an effect from 100ms to 200ms with a median effect of 5µV [3.5, 4.7µV] “.

Dont write: “At t=125ms the conditions differed significantly (TFCE correction for multiple comparisons) with a median effect of 5µV [3.5, 4.7µV] “
Write: “We found significant difference between conditions (TFCE correction for multiple comparisons). This difference was driven by a cluster starting at 125ms with a median effect of 5µV [3.5, 4.7µV] .”

These messages are much less snappy, sexy, short or easy to understand. The important bit is to signal to the reader that the cluster permutation test does not state significance about a single timepoint or electrode/voxel, but only indicates a significant difference somewhen/re between your conditions.

This problem has been recently discussed on twitter. One proposed alternative is All-Resolution-Inference. There is a barebone R-implementation in the hommel package and I would be interested in translating it to matlab to be readily usable with cluster permutation for EEG data.

Thanks for personal discussions (these are not endorsements, all mistakes in this blogpost are mine!) with Eelke Spaak, Robert Oostenveld, René Scheeringa, Olaf Dimigen & Phillip Alday + twitter interactions with Guillame A Rousselet, Cyril Pernet, Thomas Nichols and Martin Hebart. Thanks to Anna Lisa Gert for critical comments on this blogpost.

Simulating EEG/ERP Data with SEREEGA & multiple comparison corrections

For the $i^2 c^2 s^3$ summer school I simulated quite a bit of data and analyzed them with several common multiple comparison methods. I used the SEREEGA toolbox for the simulation. All the MatLab-code can be found at the end of this post. In a follow-up blogpost I will extend the toolbox to continuous data that we can analyze with the unfold toolbox.

First, I simulated data based on three effects: Two early dipoles representing the P100, one right lateralized for the N170 and a deep one for the P300.

Two symmetric sources for the P1 in the occipital cortex, one lateralized to the right cortex (third panel right most) for the N170 and a deep one, central for the P300.

I added brown and white noise to the simulated epochs. An exemplary eegplot shows that it kind of looks like EEG data.

5 trials in the default pop_eegplot of eeglab

I generate two conditions with two different condition-differences, one on the N170 and one on the P300 . The same data are depicted in the following three plots:

Data in butterfly plot & time-aligned points, plotted using the eegvis toolbox

In the first row we see a butterfly plot of activation in all channels (the colored channels are depicted in the next plot). In the first topoplot-row we have the average activation and in the last row we have the condition differences.

The three types of components: a short but relatively strong (P100, red), a similar effect, short but strong (N170, condition difference) and a long effect (P300, with weak condition effect). Highlighted gray areas are approximate regions where the components showed condition differences. The noise was a mix between brown and white noise.

The red line is the P100, occipital effect. No difference between conditions. The green one is the N170, temporal effect. Only visible in one condition. The blue one is a P300 like, deep effect. Difference in amplitude between conditions.

A third way to look at the same data. Color indicates activation (blue => negative µV, yellow=> positive µV). Each row is one channel, each column one time point. Using such a plot, one looses spatial relations between the channels , but gains an nice overview of all the data. Note that the colorscales are different between the two panels.

Now we are ready to do some statistical testing and multiple comparison correction. I choose to compare uncorrected, bonferroni-holms, FDR (Benjamini-Hochberg), clustermass cluster permutation testing (LIMO toolbox) and threshold free cluster enhancement (EPT-TFCE toolbox).

I addapted a display from fMRI depicting t-value vs. significance based on the fmri slice display toolbox. Plot can be made using the eegvis toolbox

In this instance, uncorrected p-value did not so bad, Bonferroni-Holms is, as expected, quite conservative. FDR hat troubles with the elongated cluster and TFCE/Cluster-permutation with the short one.

Note: This is only a single simulation. In order to move from these anectotal findings to proper statements, one would need to repeat the simulation 1000 times and see how often which samples were deemed significant (see Groppe 2011).

Thanks to Anna Lisa Gert for help with writing this post.

This is the code I used, it is a bit messy because it was programmed in a tutorial-style and I kept my debug-statements in there.

%% Adding toolboxes
if ~exist('pop_loadset')
% Threshold Free cluster enhancement
%% SEREEGA Data Generation
cfg = struct('debug',0)
%% load headmodel 
lf = lf_generate_fromnyhead('montage', 'S64');

if cfg.debug

%% set seed

%% noise
noise_brown = struct( ...
    'type', 'noise', ...
    'color', 'brown', ...
    'amplitude', 2);
noise_brown = utl_check_class(noise_brown);

noise_white = struct( ...
    'type', 'noise', ...
    'color', 'white', ...
    'amplitude', 2);
noise_white = utl_check_class(noise_white);

%% visual Cortex N170
vis = [];

vis.signal{1} = struct();
vis.signal{1}.peakLatency = 170;      % in ms, starting at the start of the epoch
vis.signal{1}.peakWidth = 100;        % in ms
vis.signal{1}.peakAmplitude = -5;      % in microvolt
vis.signal{1} = utl_check_class(vis.signal{1}, 'type', 'erp');
vis.signal{2} = noise_brown;

% right visual cortex
vis.source= lf_get_source_nearest(lf, [50 -40 -25]);
if cfg.debug
    plot_source_location(vis.source, lf, 'mode', '3d','shrink',0);

% vis.orientation = utl_get_orientation_pseudotangential(vis.source,lf); 
vis.orientationDv = [0 0 0]; % orientation does not change between trials
vis.orientation = [0.5,-0.46,1]; % I looked at the maps and found that this prientation most closely looked like a N170

if cfg.debug
    %% This Code I used to generate random orientations and inspect the leadfield projections
    for k = 1:10
        vis.orientation = utl_get_orientation_random(1)
        plot_source_projection(vis.source, lf, 'orientation', vis.orientation,'orientedonly',1);
        %     title(vis.orientation)
%% Generate P100, take from SEREEGA Toolbox
p100 = utl_get_component_fromtemplate('visual_p100_erp',lf);
if cfg.debug
    % Visualize the response 
    plot_signal_fromclass(vis.signal{1}, epochs);

%% Generate P300
p3 = [];

p3.signal{1} = struct();
p3.signal{1}.peakLatency = 350;      % in ms, starting at the start of the epoch
p3.signal{1}.peakWidth = 400;        % in ms
p3.signal{1}.peakAmplitude = -3;      % in microvolt
p3.signal{1} = utl_check_class(p3.signal{1}, 'type', 'erp');
p3.signal{2} = noise_brown;

% somewhere deep in the brain
p3.source= lf_get_source_nearest(lf, [0 -40 -25]);
if cfg.debug
    %% visualize it using SEREEGA
    plot_source_location(p3.source, lf, 'mode', '3d');

p3.orientation = utl_get_orientation_pseudotangential(p3.source,lf);
p3.orientationDv = [0 0 0];

p3.orientation = [0.03,-0.16,1];
if cfg.debug
    %% again find an orientation that somehow matches P300topography
    for k = 1:10
        p3.orientation = utl_get_orientation_random(1)
        plot_source_projection(p3.source, lf, 'orientation', p3.orientation,'orientedonly',1);

%% generate 25 random noise sources and combine all components
sources = lf_get_source_spaced(lf, 10, 25);
noise_brown.amplitude = 5;
comps = utl_create_component(sources, noise_brown, lf);
comps = utl_add_signal_tocomponent(noise_white,comps);
[comps1, comps2] = deal(comps);

% add P100 to noise components
comps1(end+1:end+2) = p100; %bilateral
comps2(end+1:end+2) = p100;

% N170 with condition difference
vis.signal{1}.peakAmplitude = -3;
comps1(end+1) = vis;
vis.signal{1}.peakAmplitude = -1;
comps2(end+1) = vis;

% broad, weak P300, with condition difference
p3.signal{1}.peakAmplitude = -10;
comps1(end+1) = p3;
p3.signal{1}.peakAmplitude = -12;
comps2(end+1) = p3;
%% Generate The EEG Data
epochs = struct();
epochs.n = 50;             % the number of epochs to simulate
epochs.srate = 250;        % their sampling rate in Hz
epochs.length = 500;

data1 = generate_scalpdata(comps1, lf, epochs);
data2 = generate_scalpdata(comps2, lf, epochs);
EEG1 = utl_create_eeglabdataset(data1, epochs, lf, 'marker', 'event1');
EEG2 = utl_create_eeglabdataset(data2, epochs, lf, 'marker', 'event2');
EEG = utl_reorder_eeglabdataset(pop_mergeset(EEG1, EEG2));

%% First inspection, this should match the intended topographies
pop_topoplot(EEG, 1, [0 100 170,300], '', [1 8]);
%% ERP-Like Figure
findtype = @(type)cellfun(@(x)strcmp(type,x),{EEG.event.type});

ploterp = @(chan)plot(EEG.times,...
hold on,ploterp(63),title(EEG.chanlocs(63).labels)
hold on,ploterp(30),title(EEG.chanlocs(30).labels)

%% Within-Subject T-Test using LIMO

[m,dfe,ci,sd,n,t,p] = limo_ttest(2,,:,findtype('event1')),,:,findtype('event2')),0.05);
%% simple imagesc plots
figure,subplot(2,1,1),imagesc(EEG.times,1:64,mean(,3));box off
subplot(2,1,2),imagesc(EEG.times,1:64,m),box off
% Show data

%% Prepare statistics

ept_tfce_nb = ept_ChN2(EEG.chanlocs,0); % the 1 to plot
tfce_res = ept_TFCE(...
    permute(,:,findtype('event1')),[3 1 2]),...

t_h0 = nan([size(,1),size(,2),800]);
for b = 1:800
    r_perm      = randperm(epochs.n*2); % Consider using Shuffle mex here (50-85% faster)...
    nData       =,:,r_perm);
    sData{1}    = nData(:,:,1:epochs.n);
    sData{2}    = nData(:,:,(epochs.n+1):(epochs.n*2));
    t_h0(:,:,b) = (mean(sData{1},3)-mean(sData{2},3))./sqrt(var(sData{1},[],3)/epochs.n+var(sData{2},[],3)/epochs.n);
p_h0 = tpdf(t_h0,199);
%% Generate Cluster Permutation
limo_nb = limo_neighbourdist(EEG,50);
[limomask,cluster_p,max_th] = limo_clustering(t.^2,p,t_h0.^2,p_h0,struct('data',struct('chanlocs',EEG.chanlocs','neighbouring_matrix',limo_nb)),2,0.05,0);
cluster_p(isnan(cluster_p(:))) = 1;
% F_tfce = limo_tfce(2,t.^2,limo_nb);

%% show significance methods
for k = 0:4
    switch k
        case 0
            plot_p = p;
            mask = plot_p<0.05;
        case 1
            plot_p = bonf_holm(double(p),0.05);
            mask = plot_p<0.05;
        case 2
            [~,~,~,plot_p] = fdr_bh(p,0.05,'pdep','no'); % see groppe 2011 why pdep is fine
            mask = plot_p<0.05;
        case 3
            plot_p = cluster_p;
            mask = plot_p<0.05;
        case 4
            plot_p = tfce_res.P_Values;
            mask = plot_p<0.05;
    plot_data= plot_p;
    plot_data = log10(plot_data);
        'contour',1,'clustermask',mask,'figure',0,'xlabel',0,'colorbar',k == 4)
%     caxis([-7,0])
%     if k ~=4
%         axis off
%     end
%     box off
    plot_data(~mask) = 1;
    box off

hA = plot_topobutter(mean(,3),EEG.times,EEG.chanlocs,'colormap',{'div'},'quality',40,'n_topos',12);

hA = plot_topobutter(cat(3,mean(,3),m),EEG.times,EEG.chanlocs,'quality',32,'individualcolorscale','row','highlighted_channel',[44,63]);

%% Additional Blog Figures
plot_source_location([p100.source,vis.source,p3.source], lf, 'mode', '2d');

1 of 5