This dataset is real data, published by Ehinger, Kaufhold König (2016).
Caption (Ehinger et al.): A) Experimental Procedure. Subjects foveate a single bubble for the forced fixation time. […] The next step removes the bubble and simultaneously displays up to five new bubbles. The dependent variable is the time until a saccade initiation. After the selection of one bubble and most often before the saccade ends, the others bubbles are removed. The newly foveated stimulus is shown for a newly sampled forced fixation time. B) Example Stimulus Urban/Pink Noise. C) The sampling points for bubbles following the pseudo-random Poisson disc sampling algorithm. D) A single bubble as seen by the subject.
The following subset of fields are available:
Variables | Description |
---|---|
subject | participant ID (not numbered consecutively) |
choicetime | Dependent variable, the time to move to a new fixation location |
forcedFixtime | Independent variable, the time the last bubble was shown |
NumOfBubbles | Independent variable, the number of bubbles subjects are able to choose from |
Image | Name of the Image that was used as a backdrop |
stimulus_type | Image Type (Noise vs. Urban Image) |
library(ggplot2)
library(lme4)
library(plyr)
library(magrittr)
d = read.csv('fixdur.csv')
We start with only a single continuous predictor: Number of bubbles (ranging from 1 to 5)
ggplot
to plot number of bubbles
against choicetime
. Use stat_summary()
.ggplot(d,aes(x=NumOfBubbles,y=choicetime))+stat_summary()
## No summary function supplied, defaulting to `mean_se()
Hint: Calculate the number of trials for each condition.
## 1 2 3 4 5
## 19335 9216 4513 2137 1091
Build a simple linear mixed model with NumOfBubbles and a random intercept and random slope for the random grouping variable subject. We need the random coefficients to account for the correlation between subjects (aka repeated measures).
Inspect the model. Compared to a model without the fixed coefficient, is the log-likelihood significantly increased?
m0 = (lmer(data=d,formula=choicetime~1+(NumOfBubbles+1|subject)))
m1 = (lmer(data=d,formula=choicetime~1+NumOfBubbles+(NumOfBubbles+1|subject)))
anova(m0,m1)
## refitting model(s) with ML (instead of REML)
## Data: d
## Models:
## m0: choicetime ~ 1 + (NumOfBubbles + 1 | subject)
## m1: choicetime ~ 1 + NumOfBubbles + (NumOfBubbles + 1 | subject)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m0 5 411296 411339 -205643 411286
## m1 6 411249 411300 -205618 411237 49.7 1 1.792e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Fit the following model choicetime ~ 1 + NumOfBubbles + (1 + NumOfBubbles | subject)
. ### Fixed coefficients 1. What does the intercept represent here?
NoB=1
to NoB=5
?What is the spread of the intercept over subjects?
Bonus: Calculate the values where 95% of subject’s intercepts would lay (hint: qnorm
is your friend) assuming the estimated parameters are correct.
## Linear mixed model fit by REML ['lmerMod']
## Formula: choicetime ~ 1 + NumOfBubbles + (1 + NumOfBubbles | subject)
## Data: d
##
## REML criterion at convergence: 411231.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4552 -0.3797 -0.1367 0.1499 25.3407
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 187.85 13.706
## NumOfBubbles 17.61 4.196 0.40
## Residual 4857.77 69.698
## Number of obs: 36292, groups: subject, 35
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 155.4246 2.4278 64.02
## NumOfBubbles 8.1865 0.7899 10.36
##
## Correlation of Fixed Effects:
## (Intr)
## NumOfBubbls 0.232
We want a plot with NumOfBubbles on X, predicted choicetime on Y for each subject. We can get \(\hat{y}\) using predict
.
Bonus: You can add the individual data + individual fit easily with ggplot. You could also add the average slope using aes(group=NULL)
d$predict = predict(m1)
ggplot(d,aes(x=NumOfBubbles,y=predict,group=factor(subject))) +
stat_summary(geom='line')+
stat_summary(geom='line',size=2,aes(group=NULL))+
stat_summary(aes(y=choicetime),geom='point',color='red')+facet_wrap(~subject)
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
## No summary function supplied, defaulting to `mean_se()
We currently have three random coefficients, Intercept
, NumOfBubbles
and the correlation between the two. We want to visualize them similar to the lecture.
mvtnorm
for this exercise; use install.packages('mvtnorm')
to install it# Get the fixed coefficients
gaus2d.mean <- fixef(XXX)
# Get the estimated Random Coefficient Covariance Matrix
gaus2d.cov <- VarCorr(XXX)$XXX
# make a grid of points from 100:300 and 0:20 to evaluate the gaussian function on
data.grid <- expand.grid(x1 = seq(100, 300, length.out=200), x2 = seq(0, 20, length.out=200))
# Generate a further column 'prob' with the density of a multivariate norm.
q.samp <- cbind(data.grid, prob = mvtnorm::dmvnorm(data.grid, mean = gaus2d.mean, sigma = gaus2d.cov))
# Plot it!
ggplot(q.samp, aes(x=XXX, y=XXX, z=XXX)) +
geom_contour()+
geom_point(inherit.aes=F,data=data.frame(coef(XXX)$XXX),aes(x=XXX,y=XXX)) # this is a bonus, but worth it, plot the single subject estimates in here as well.
gaus2d.mean <- fixef(m1)#[c(2,4)]
gaus2d.cov <- VarCorr(m1)$subject#[c(2,4),c(2,4)]
data.grid <- expand.grid(s.1 = seq(100, 300, length.out=200), s.2 = seq(0, 20, length.out=200))
#data.grid <- expand.grid(s.1 = seq(-40,150, length.out=200), s.2 = seq(-40, 20, length.out=200))
q.samp <- cbind(data.grid, prob = mvtnorm::dmvnorm(data.grid, mean = gaus2d.mean, sigma = gaus2d.cov))
ggplot() +
geom_tile(data=q.samp,aes(x=s.1, y=s.2, fill=prob))+
geom_point(data=data.frame(coef(m1)$subject),aes(x=X.Intercept.,y=NumOfBubbles),color="red")
#geom_point(inherit.aes=F,data=data.frame(coef(m1)$subject),aes(x=log.NumOfBubbles.,y=log.NumOfBubbles..log.forcedFixtime.))
We want to empirically compare a linear fit with a logarithmic fit (which theoretically makes more sense, as it generalizes better to, say, 25 bubbles). Fit a third model with the log-transformed predictor NumOfBubbles as an additional coefficient and do a model comparison.
log(NumOfBubbles)
?Hint: We might run into convergence issues. One strategy to deal with it is to find the most parsimonious model (the biggest model supported by the data). Try removing the correlation between NumOfBubbles
and log(NumOfBubbles)
. Does this help? You can continue removing correlation parameters from the model until you are sure the model converged.
Note that Barr et al show that not modeling random slopes increases Type-I errors. Maybe it is more useful in this case to choose either NoB
or log(NoB)
instead of incorporating both.
m2 = (lmer(data=d,formula=choicetime~1+log(NumOfBubbles) +NumOfBubbles+(log(NumOfBubbles) + NumOfBubbles+1|subject)))
m2a = (lmer(data=d,formula=choicetime~1+log(NumOfBubbles)+NumOfBubbles+(log(NumOfBubbles) + NumOfBubbles+1||subject)))
m2b = (lmer(data=d,formula=choicetime~1+log(NumOfBubbles)+NumOfBubbles+(log(NumOfBubbles) + 1|subject)+(0+NumOfBubbles||subject)))
anova(m2b,m1)
## refitting model(s) with ML (instead of REML)
## Data: d
## Models:
## m1: choicetime ~ 1 + NumOfBubbles + (1 + NumOfBubbles | subject)
## m2b: choicetime ~ 1 + log(NumOfBubbles) + NumOfBubbles + (log(NumOfBubbles) +
## m2b: 1 | subject) + ((0 + NumOfBubbles | subject))
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m1 6 411249 411300 -205618 411237
## m2b 8 411245 411313 -205615 411229 7.3428 2 0.02544 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ct ~ NumOfBubbles
and ct ~ log(NumOfBubbles)
directly?'no, use AIC'
Plot the residuals of the LMM. Are they normally distributed?
If you find problems, think of transformations that could improve the shortcomings.
Now plot the residuals against NumOfBubbles
. Are there any (nonlinear) dependencies left over?
m1 = (lmer(data=d,formula=choicetime~NumOfBubbles+(1+NumOfBubbles|subject)))
qqnorm(residuals(m1))
qqline(residuals(m1))
plot(residuals(m1))
d2 = d
d2$resid = residuals(m1)
ggplot(d2,aes(x=NumOfBubbles,y=resid))+stat_summary()
## No summary function supplied, defaulting to `mean_se()
In a mixed model, you estimate an effect size for each subject (whether this counts as a parameter or not is for some people up to discussion, from a Bayesian point of view they clearly do). If you want nicer plots you could try the package sjp.lmer
This estimate is called the BLUP (Best Linear Unbiased Predictor). You can receive the BLUP model estimates using the following code:
ranef(m1)$subject$`(Intercept)`
If you want to add the fixed coefficients, use coeff
instead of ranef
(if you don’t understand what that means: calculate the mean of the BLUP over subjects once for coeff
and ranef
and compare)
qqnorm
of these BLUP estimated values. Are they following the assumed distribution?Note: Based on these values (and due to it being reaction times, i.e. cannot be smaller than 0 and right tail) it is clear that the data should be transformed, log
transform seems a good candidate. But remember that there could be other factors heavily influencing the residuals. Maybe things get better if we include them? Also, for practical conclusions, we expect no difference between the transformed and non-transformed data: our effects are quite large, we have lots of data and discrepancies only occur in the upper & lower 5% quantiles. If we are closer to noise-level things might be very different.
log(NumOfBubbles)
, log(forcedFixtime)
and their interaction.d2 = d%>%transform(standLogNoB = log(NumOfBubbles)-mean(log(NumOfBubbles)))
d2 = d2%>%transform(standLogFF = log(forcedFixtime)-mean(log(forcedFixtime)))
m3 = lmer(data=d2,choicetime ~ standLogNoB*standLogFF + (1+standLogNoB*standLogFF | subject),REML=F)
m3b = lmer(data=d2,choicetime ~ standLogNoB+standLogFF + (1+standLogNoB*standLogFF | subject),REML=F)
anova(m3,m3b)
## Data: d2
## Models:
## m3b: choicetime ~ standLogNoB + standLogFF + (1 + standLogNoB * standLogFF |
## m3b: subject)
## m3: choicetime ~ standLogNoB * standLogFF + (1 + standLogNoB * standLogFF |
## m3: subject)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m3b 14 410359 410478 -205166 410331
## m3 15 410352 410479 -205161 410322 9.4199 1 0.002146 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You will most likely run into convergence problems here (and the model take quite some time to fit). Try to standardize your parameters (removing the mean should be sufficient, but you can also try to divide by their respective SD. In the former case only the intercept changes its interpretation, in the latter also the slope changes its interpretation).
A very good idea is to visualize your effects. Use predict
in combination with expand.grid
to produce a fully balanced data set. Plot choicetime
against forcedFixtime
and split by factor(NumOfBubbles)
. To get a feeling for the size of the interaction predict the difference between 100ms and 1000ms at NumOfBubbles==1 with the difference between 100ms and 1000ms at NumOfBubbles==5.
predictdata = data.frame(expand.grid(standLogFF = log(seq(1,1000,length.out = 100))-log(500),
standLogNoB=seq(1,5)-log(2.5),
subject=unique(d$subject)))
predictdata$predict = predict(m3,newdata=predictdata)
ggplot(predictdata,aes(x=standLogFF,y=predict,color=factor(standLogNoB)))+
stat_summary(geom="line")+
scale_x_log10()
## Warning in self$trans$transform(x): NaNs wurden erzeugt
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 8750 rows containing non-finite values (stat_summary).
## No summary function supplied, defaulting to `mean_se()
We used 128 different images (64 urban, 64 noise).
Run a linear mixed model with log(FF) as random slope + fixed effect grouped in subject and additional image
as a random variable with a random intercept.
Use summary
, qqnorm
+ qqline
to investigate the new random variable image
for normality
Run a model with log(FF)
and stimulus_type
as a categorical factor. Include both appropriate random coefficients and random variables. Is the fixed effect stimulus_type
still significant, even after controlling for image
-variability?
m4 = lmer(data=d2,sqrt(choicetime) ~ standLogFF+stimulus_type + (1+standLogFF+stimulus_type | subject)+(1|image),REML=F)
m4b = lmer(data=d2,sqrt(choicetime) ~ standLogFF + (1+standLogFF+stimulus_type | subject)+(1|image),REML=F)
anova(m4,m4b)
## Data: d2
## Models:
## m4b: sqrt(choicetime) ~ standLogFF + (1 + standLogFF + stimulus_type |
## m4b: subject) + (1 | image)
## m4: sqrt(choicetime) ~ standLogFF + stimulus_type + (1 + standLogFF +
## m4: stimulus_type | subject) + (1 | image)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m4b 10 150182 150267 -75081 150162
## m4 11 150168 150261 -75073 150146 16.277 1 5.472e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1