1 Introduction

1.1 What will I learn

  • Mixed Models
  • Testing for main effects / interactions in LMM
  • Interpretation of Variance Random Coefficients
  • Testing Assumptions of LMM

1.2 The experiment

This dataset is real data, published by Ehinger, Kaufhold König (2016).

Caption (Ehinger et al.): A) Experimental Procedure. Subjects foveate a single bubble for the forced fixation time. […] The next step removes the bubble and simultaneously displays up to five new bubbles. The dependent variable is the time until a saccade initiation. After the selection of one bubble and most often before the saccade ends, the others bubbles are removed. The newly foveated stimulus is shown for a newly sampled forced fixation time. B) Example Stimulus Urban/Pink Noise. C) The sampling points for bubbles following the pseudo-random Poisson disc sampling algorithm. D) A single bubble as seen by the subject.

The following subset of fields are available:

Variables Description
subject participant ID (not numbered consecutively)
choicetime Dependent variable, the time to move to a new fixation location
forcedFixtime Independent variable, the time the last bubble was shown
NumOfBubbles Independent variable, the number of bubbles subjects are able to choose from
Image Name of the Image that was used as a backdrop
stimulus_type Image Type (Noise vs. Urban Image)

1.3 The research questions

  1. Is there an decreasing or increasing relationship of choicetime (reaction time) with the number of possible targets?
  2. Ís there an decreasing relationship between the forced fixation time and the choicetime

2 Analysis of Number of Bubbles

2.1 Load the data

library(ggplot2)
library(lme4)
library(plyr)
d =  read.csv('fixdur.csv')

We start with only a single continuous predictor: Number of bubbles (ranging from 1 to 5)

2.2 Plotting

  1. Use ggplot to plot number of bubbles against choicetime. Use stat_summary().
ggplot(d,aes(x=XXX,y=XXX))+stat_summary()
  1. Why are the standard errors increasing?

Hint: Calculate the number of trials for each condition.

summary(factor(d$NumOfBubbles))

2.3 Mixed model

  1. Build a simple linear mixed model with NumOfBubbles and a random intercept and random slope for the random grouping variable subject. We need the random coefficients to account for the correlation between subjects (aka repeated measures).
m1 = (lmer(data=XXX,formula=XXX~1+XXX+(XXX+XXX|XXX)))
  1. Inspect the model. Compared to a model without the fixed coefficient, is the log-likelihood significantly increased?
m0 = (lmer(data=XXX,formula=XXX~1+(XXX+XXX|XXX)))
anova(XXX,XXX)

2.4 Interpretation

Fit the following model choicetime ~ 1 + NumOfBubbles + (1 + NumOfBubbles | subject). ### Fixed coefficients 1. What does the intercept represent here?

  1. What are the average increases from NoB=1to NoB=5?
fixef(m1)$XXX * 5

2.5 Random Coefficients

2.5.1 Intercept

What is the spread of the intercept over subjects?

Bonus: Calculate the values where 95% of subject’s intercepts would lay (hint: qnorm is your friend) assuming the estimated parameters are correct.

m1 = lmer(data=d,formula=choicetime ~ 1 + NumOfBubbles + (1 + NumOfBubbles | subject))
summary(m1)

2.5.2 Slope

We want a plot with NumOfBubbles on X, predicted choicetime on Y for each subject. We can get \(\hat{y}\) using predict.

d$predict = predict(XXX)
ggplot(d,aes(x=XXX,y=XXX,group=factor(XXX))) + 
  stat_summary(geom='line')

Bonus: You can add the individual data + individual fit easily with ggplot. You could also add the average slope using aes(group=NULL)

2.5.3 Plot the 2D gaussian underlying the random coefficients

We currently have three random coefficients, Intercept, NumOfBubbles and the correlation between the two. We want to visualize them similar to the lecture.

  1. You need the package mvtnorm for this exercise; use install.packages('mvtnorm') to install it
  2. Complete the following code
  3. Be sure you understand what gaus2d.cov represents! You could play around with the numbers and see how it affects the resulting plot.
# Get the fixed coefficients
gaus2d.mean  <- fixef(XXX)
# Get the estimated Random Coefficient Covariance Matrix
gaus2d.cov   <- VarCorr(XXX)$XXX

# make a grid of points from 100:300 and 0:20 to evaluate the gaussian function on
data.grid <- expand.grid(x1 = seq(100, 300, length.out=200), x2 = seq(0, 20, length.out=200))

# Generate a further column 'prob' with the density of a multivariate norm.
q.samp <- cbind(data.grid, prob = mvtnorm::dmvnorm(data.grid, mean = gaus2d.mean, sigma = gaus2d.cov))

# Plot it!
ggplot(q.samp, aes(x=XXX, y=XXX, z=XXX)) + 
    geom_contour()+
    geom_point(inherit.aes=F,data=data.frame(coef(XXX)$XXX),aes(x=XXX,y=XXX)) # this is a bonus, but worth it, plot the single subject estimates in here as well.

3 To log or not to log

We want to empirically compare a linear fit with a logarithmic fit (which theoretically makes more sense, as it generalizes better to, say, 25 bubbles). Fit a third model with the log-transformed predictor NumOfBubbles as an additional coefficient and do a model comparison.

  1. Is it empirically better to take log(NumOfBubbles)?

Hint: We might run into convergence issues. One strategy to deal with it is to find the most parsimonious model (the biggest model supported by the data). Try removing the correlation between NumOfBubbles and log(NumOfBubbles). Does this help? You can continue removing correlation parameters from the model until you are sure the model converged.

Note that Barr et al show that not modeling random slopes increases Type-I errors. Maybe it is more useful in this case to choose either NoB or log(NoB) instead of incorporating both.

m2  = lmer(data=XXX,formula=XXX~XXX+ log(XXX)+(1+XXX+XXX|XXX))
m2a = lmer(data=XXX,formula=XXX~XXX+ log(XXX)+(1+XXX+XXX||XXX))
m2b = lmer(data=XXX,formula=XXX~XXX+ log(XXX)+(1+XXX|XXX)+ (0+XXX|XXX))
anova(XXX,XXX)
  1. Are you allowed to do LLR - model comparisons between the model ct ~ NumOfBubbles and ct ~ log(NumOfBubbles) directly?
  2. Do you know other model comparison techniques you could apply here?

4 Checking assumptions

4.1 First Level

  1. Plot the residuals of the LMM. Are they normally distributed?

Hint: Use qqnorm and plot the residuals. It might also help to plot a histogram of the residuals, are they shaped like a normal distribution?

  1. If you find problems, think of transformations that could improve the shortcomings.

  2. Now plot the residuals against NumOfBubbles. Are there any (nonlinear) dependencies left over?

4.2 Second Level

In a mixed model, you estimate an effect size for each subject (whether this counts as a parameter or not is for some people up to discussion, from a Bayesian point of view they clearly do). If you want nicer plots you could try the package sjp.lmer

This estimate is called the BLUP (Best Linear Unbiased Predictor). You can receive the BLUP model estimates using the following code:

ranef(m1)$subject$`(Intercept)`

If you want to add the fixed coefficients, use coeff instead of ranef (if you don’t understand what that means: calculate the mean of the BLUP over subjects once for coeff and ranef and compare)

  1. Plot a histogram or qqplot (not ggplot ;)) of these BLUP estimated values. Are they following the assumed distribution?
ranef.intercept = ranef(m1,condVar=T)$subject$`(Intercept)`
qqnorm(ranef.intercept)
qqline(ranef.intercept)
  1. Do the same for the random slope.

Note: Based on these values (and due to it being reaction times, i.e. cannot be smaller than 0 and right tail) it is clear that the data should be transformed, log transform seems a good candidate. But remember that there could be other factors heavily influencing the residuals. Maybe things get better if we include them? Also, for practical conclusions, we expect no difference between the transformed and non-transformed data: our effects are quite large, we have lots of data and discrepancies only occur in the upper & lower 5% quantiles. If we are closer to noise-level things might be very different.

5 Multivariate / forcedFixationtime

5.1 The model

  1. Fit the model with continuous predictor log(NumOfBubbles), log(forcedFixtime) and their interaction.
m3 = lmer(data=d,choicetime ~ log(XXX) * log(XXX) + (XXX | subject),REML=F)

5.2 Is the interaction significant?

You will most likely run into convergence problems here (and the model take quite some time to fit). Try to standardize your parameters (removing the mean should be sufficient, but you can also try to divide by their respective SD. In the former case only the intercept changes its interpretation, in the latter also the slope changes its interpretation).

5.3 Interpreting the log-log interaction

A very good idea is to visualize your effects. Use predict in combination with expand.grid to produce a fully balanced data set. Plot choicetime against forcedFixtime and split by factor(NumOfBubbles). To get a feeling for the size of the interaction predict the difference between 100ms and 1000ms at NumOfBubbles==1 with the difference between 100ms and 1000ms at NumOfBubbles==5.

# Generate a fully balanced new data set
predictdata = data.frame(expand.grid(forcedFixtime = seq(XXX,XXX,length.out = 100),
                                     NumOfBubbles=seq(XXX,XXX),
                                     subject=unique(d$subject)))
predictdata$predict = predict(XXX,newdata=XXX) # take the model prediction

ggplot(predictdata,aes(x=XXXX,y=XXXX,color=factor(XXXX)))+
  stat_summary(geom="line")+ # we are not interested in different subjects
  scale_x_log10()            # it is easier to see the interaction with the log-scale. 
                             # but feel free to remove it to understand whats really going on

6 Multiple Random Effects

We used 128 different images (64 urban, 64 noise).

  1. Run a linear mixed model with log(FF) as random slope + fixed effect grouped in subject and additional image as a random variable with a random intercept.

  2. Use summary, qqnorm + qqline to investigate the new random variable image for normality

Hint: You can get the random effect similarily to the code above using ranef(model)$image$(Intercept)3. Run a model withlog(FF)andstimulus_typeas a categorical factor. Include both appropriate random coefficients and random variables. Is the fixed effectstimulus_typestill significant, even after controlling forimage`-variability?

Hint: Keep the same random effects structure for a model with and one without the fixed effect stimulus_typeand do a LRT model comparison using the anova function`.