Slides & my Presentation style

Click this link to add notes to the slides

this adds ?showNotes=true to the file

You can download the pdf or print them yourself:

click this link for pdf printable (use google chrome)

click this link for pdf printable + notes

HTML is best viewed in chrome

Overview

Repeated measures
An example to guide you through the steps
The mixed model
Inference
Assumption checking
Shrinkage
Terminology
Convergence Problems
Multiple random variables

Repeated measures

It looks like there is no difference between groups?

But what if we have data from within the same subjects?

Repeated measures

Individual differences

Between-Subject variability can be removed by repeated measures.

Never analyze repeated measures data without taking repeated measures into account (Disaggregation, Pseudoreplication)

Accouting for repeated measures?

Black = mean of the subject.

\(n_S=5\) subjects

Sampling distributions

Multiple trials per subject

with \(n_T =15\) trials

Sampling distributions

This sampling distribution reflects these exact 5 subjects. But does not generalize!

Sampling distributions

(pink for aggregate, purple for disaggregate). There is a strong allure to choose the purple one, easier to get significant results!

\[n_{subject}(aggregate) = df+1 = 5\] \[n_{"subject"}(pseudoreplication) = df+1= 5*5 = 25\]

DONT DO THIS, YOU KNOW (now) BETTER

Always necessary to aggregate?

\[\sigma_{within} >> \sigma_{between}\]

Subjects are very similar

Not so necessary to aggregate first

A single parameter summarizes subject-means well

\[\sigma_{within} << \sigma_{between}\]

Subjects are very different

Totally necessary to aggregate first

\(n_{subject}\) parameters summarizes subject-means well

’

Should we trust all subjects equally?

Question: Estimate the mean of the population

Should we trust all subjects equally?

Question: Estimate the mean of the population

Should we trust all subjects equally?

We should take into account sample-size and variance into our estimate of the population mean

We should trust those subjects more, where we most sure about their central tendency

Summary

Why repeated measures
- More power to find effects that exist!
Don’t forget about repeated measures!
- Else your type I error will be very high
Should we trust all subjects equally?
- No, we should punish for small within-subject sample-size and punish high variance

At this point it should be said, that this dependency is NOT special to subjects. Imagine you measure how well different classes of a school do. Now agian, you have dependent measures because you can test in each class multiple times (because there are students in there). The first conclusion is that dependencies can arise from many different things. usually it is helpful to think hierarchical. The school example could be increase in both dimensions: Questions - Students - Class - School - District - Nation etc. We will soon learn how to incorporate all this knowledge in the model.

A second point is, repeated measures allow us to remove variance which is great! But standard approaches do not work anymore and might make us TOO confident. Nevertheless if the experimental design allows it, always try to do repeated measures.

Overview

Repeated measures
An example to guide you through the steps
The mixed model
Inference
Assumption checking
Shrinkage
Terminology
Convergence Problems
Multiple random variables

The example

EEG-Experiment: Contrast vs. P100 size

=> Very noisy subjects (babys?), between 4 and 29 trials

The data

one model for all subjects

\[\hat{y} = \beta_0 + \beta_1 * contrast \]

one model for each subject

\[\hat{y_j} = \beta_{0,j} + \beta_{1,j} * contrast \]

Excourse: two-stage statistics

This slide represents is a small excourse because it is very common in fMRI research and somewhat common in EEG.

The left plot shows all individual subject fits in one single plot.

It is statistically possible and accurate (but not as powerful as mixed models), to first summarize each subject (fit a model for each subject) and then test the model-parameters for systematic effects. The idea is, that if there is no effect of contrast (i.e. on the parameter slope), then the distribution of slope-parameters should be around zero (indicated by the black line in the right panel). If we find a deviation from 0 (right panel, blue dashed line) we can conclude that the subjects show a non-zero slope. Note that the first model (the individual fit) is usually referred to as the 1st level. The test of the parameters against 0 referred to as the 2nd-level (or group level).

One solution

Linear Mixed Model aka Hierarchical Model aka Multilevel Model

We estimate the population distribution directly. Subject-means with high uncertainty are less influential for the model fit

A compromise

Small second level variance (limit \(0\))

Parameters/DF assumed to be n-trials

Large second level variance (limit \(\infty\))

Parameters/DF assumed to be subjects

shrinkageometer

Overview

Repeated measures
An example to guide you through the steps
The mixed model
Inference
Assumption checking
Shrinkage
Terminology
Convergence Problems
Multiple random variables

Formal definition of the linear mixed model

A word of warning

Encouraging psycholinguists to use linear mixed-effects models ‘was like giving shotguns to toddlers’ (Barr)

Because mixed models are more complex and more flexible than the general linear model, the potential for confusion and errors is higher (Hamer)

Many ongoing riddles:

Book:Richly Parameterized Linear Models

Chp. 9: Puzzles from Analyzing Real Data sets
Chp.10: A Random Effect Competing with a Fixed Effect
Chp.12: Competition between Random Effects
Chp.14: Mysterious, Inconvenient or Wrong Results from Real Datasets

The mixed model in R

\(\hat{y} = \beta_0 + \beta_1 * contrast + u_{0,j} + u_{1,j} * contrast\)

\(\hat{y}_j = \beta X_j + b_jZ\)

Fitting this in R using the package lme4

response ~ beta + (beta|group)

In our example:

response ~ 1 + contrast + (1|subject) + (0+contrast|subject)

Mixed models in R

summary(lmer(p100~1 + contrast + (1 + contrast||subject),data=d))

## Linear mixed model fit by REML ['lmerMod']
## Formula: p100 ~ 1 + contrast + ((1 | subject) + (0 + contrast | subject))
##    Data: d
## 
## REML criterion at convergence: 2271.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.71978 -0.62658 -0.01064  0.65739  2.91110 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  subject   (Intercept) 10.16741 3.1886  
##  subject.1 contrast     0.01353 0.1163  
##  Residual              54.45510 7.3794  
## Number of obs: 323, groups:  subject, 20
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  4.03422    1.10536   3.650
## contrast     0.09536    0.03045   3.131
## 
## Correlation of Fixed Effects:
##          (Intr)
## contrast -0.324

gray: one model, color: individual model, black: mixed model

Overview

Repeated measures
An example to guide you through the steps
The mixed model
Inference
- REML vs ML
Assumption checking
Shrinkage
Terminology
Convergence Problems
Multiple random variables

Model-Comparisons aka p-values

Unresolved problem!

Read (three links):how to get p values, problems with stepwise regression, different ways to get p-values

One way (good and common, but not the best*): Model comparison for fixed effects:

Be sure to keep random structure equal

anova(mres0,mres1)

“refitting model(s) with ML (instead of REML)”

## Data: d
## Models:
## mres0: p100 ~ 1 + (1 + contrast | subject)
## mres1: p100 ~ 1 + contrast + (1 + contrast | subject)
##       Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)   
## mres0  5 2286.5 2305.3 -1138.2   2276.5                            
## mres1  6 2280.1 2302.7 -1134.0   2268.1 8.4095      1   0.003733 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

* currently bootstrapping or mcmc recommended

REML VS ML

Intuition:

When estimating the variance parameter using maximum likelihood (ML) \(\sigma\) is biased because \(\mu\) is not known at the same time

Solution:

Fit all fixed effects \(\mu\), then refit the variance parameters on the residing estimates.

This is called restricted maximum likelihood (REML). If \(n\) is large, no difference between ML and REML exist. If \(n\) is small REML gives better predictive results.

If you do model comparison and fixed effects (should) differ (usual case) you have to use ML

For further details see Zuur 2009 (“Mixed Effects Models and Extensions in Ecology with R”, page 116). In practice you do not need to worry too much about the difference. The R anova function will automatically refit your model, if you accidentally used REML instead of ML.

Bootstrapping

resample from the data with replacement (or parametrically from \(H_0\)-Model) 1000 times
calculate statistic of interest
observe spread of statistic

b1 <- bootMer(mres1, FUN = function(x) as.numeric(logLik(x)), nsim = nboot,seed = 1)
b2 <- bootMer(mres0, FUN = function(x) as.numeric(logLik(x)), nsim = nboot,seed = 1)
bootLL =  2 * b1$t - 2 * b2$t # grab the values & plot
quantile(bootLL, probs = c(0.025, 0.975))

pbkrtest

Better & easier: Likelihood-Ratio using R-Package pbkrtest

summary(pbkrtest::PBmodcomp(mres1,mres0,nsim=nboot))

## Parametric bootstrap test; time: 30.27 sec; samples: 200 extremes: 3;
## large : p100 ~ 1 + contrast + (1 + contrast | subject)
## small : p100 ~ 1 + (1 + contrast | subject)
##            stat     df    ddf  p.value   
## PBtest   8.3917               0.019900 * 
## Gamma    8.3917               0.011404 * 
## Bartlett 7.0365 1.0000        0.007986 **
## F        8.3917 1.0000 12.385 0.013029 * 
## LRT      8.3917 1.0000        0.003769 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The main con against bootstrapping is the high computational time (use at least 1000 not 200 as in this example)

Overview

Repeated measures
An example to guide you through the steps
The mixed model
Inference
Assumption checking
- Level 1
- Level 2
Shrinkage
Terminology
Convergence Problems
Multiple random variables

Checking Assumptions Level I

No strong non-linearity visible in residuals

qqplot indicates normality of residuals

Checking assumptions Level II

In addition: test whether normality for random-variable is true:

QQ-Plots indicate normality of both random effects.

Overview

Repeated measures
The mixed model
An example to guide you through the steps
Inference
Assumption checking
Shrinkage
Terminology
Convergence Problems
Multiple random variables

Shrinkage

The individual estimates with high variance / far off the population are shrinked inwards

Shrinkage is also sometimes referred to as partial-pooling. Complete pooling would mean ignoring subject/group structure => all subjects are the same, no pooling would mean each subjects is fitted individually.

A very useful thing of shrinkage gets obvious, when we think outside of the context of experiments. Imagine you try to estimate the deadliness of a certain cancer for all districts of Germany. Because the cancer is quite rare, in some less-populated districts it can happen that there is no cancer and thus by necessity no-one died. In other districts it might be that one person got cancer and died. Take the mean of Germany to be 1% rate and 0.2% died. These first two districts have a 0% death-rate and a 100% death-rate respectively. It seems fair to include the information on rate / death rate of all other districts (Germany) to make better predictions on the actual (not the direct observed) rate.

Shrinkage-plot

gray: one model, color: individual model, black: mixed model (with shrinkage)

Shrinkage over multiple parameters

Simulated data: Subjects with faster reaction time (intercept) show larger effects for difficulty (slope)

In reverse: If I know somebody is generally slow, he should also have a smaller slope!

The multivariate normal I

The multivariate normal II

\(\rho = 0\), knowing one dimension does not tell you about the other

\[\sum = \left[ {\begin{array}{*{20}{c}} 1&0 \\ 0 &1 \end{array}} \right]\]

\(\rho = 0.6\), knowing one dimension tells you something about the other

\[\sum = \left[ {\begin{array}{*{20}{c}} 1&0.6 \\ 0.6 &1 \end{array}} \right]\]

Back to mixed models

\[\sum = \left[ {\begin{array}{*{20}{c}} 1&\rho \\ \rho &1 \end{array}} \right]\]

Overview

Repeated measures
The mixed model
An example to guide you through the steps
Inference
Assumption checking
Shrinkage
Terminology
Convergence Problems
Multiple random variables

Naming is a bit messy

Random/Fixed Coefficients

Random intercept
- (1|subject): Different offset for each subject
Random slope
- (predictor|subject): Different predictor-slope for each subject
Random correlation
- (1+predictor|subject) or (X+Y|subject): Knowing about X tells you something about Y
Fixed coefficient
- exactly the same for all subjects

Random/Fixed Variables

Random variable
- Instance \(\in\) population, e.g. subjects, images, words
Fixed variable
- limited number of possible values or no generalization to population needed, e.g. 3 different experimental conditions

Random effect also used for either random coefficient or random variable

Fixed effect usually refers to the estimator of the population mean

source and another 5 different definitions here

Overview

Repeated measures
The mixed model
An example to guide you through the steps
Inference
Assumption checking
Shrinkage
Terminology
Convergence Problems
Multiple random variables

A more complex model

d$contrastSquare = d$contrast^2
lmer(p100~1+contrast+contrastSquare+(1+contrast+contrastSquare|subject),data = d)



Warning messages:
1: Some predictor variables are on very different scales: consider rescaling 
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues

Convergence problems

Indicator

Convergence warning (always a problem)
random effect correlation is either 0 or 1 (very seldom correct)
model takes very long time to fit (could be correct though)

Reasons

misspecified or overspecified models (specified fixed variable as random variable?)
too many parameters and not enough data,
data have big outliers/are noisy

Your possiblities:

clean data
center predictors (remove mean) and scale (divide by SD of independent variable)
use Bayesian fit and put a soft prior on the covariance structure (MCMCglmm,rstanarm,brms)
simplify the model (parsimoneous models; Bates 2015)

The ideal case

\[ \sum = \left[ {\begin{array}{*{20}{c}} 1&{{\rho_{ab}}}&{{\rho_{ac}}}&{{\rho_{ad}}}\\ {{\rho_{ab}}}&1&{{\rho_{bc}}}&{{\rho_{bd}}}\\ {{\rho_{ac}}}&{{\rho_{bc}}}&1&{{\rho_{cd}}}\\ {{\rho_{ad}}}&{{\rho_{bd}}}&{{\rho_{cd}}}&1 \end{array}} \right]\]

y ~ 1 + A + B + C + (1 + A + B + C | group)

If you have enough data, try to estimate all correlations.

The simplest structure

\[\Sigma = \left[ {\begin{array}{*{20}{c}} 1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1 \end{array}} \right]\]

y ~ 1 + X1 + X2 + X3 + (1|group) + (0+X1|group) + (0+X2|group) + (0+X3|group)

shorthand:

y ~ 1 + X1 + X2 + X3 + (1+X1+X2+X3||group)

note the || instead if |

\[\Sigma = \left[ {\begin{array}{*{20}{c}} 1&{{\rho _a}}&{{\rho _a}}&{{\rho _a}}\\ {{\rho _a}}&1&{{\rho _a}}&{{\rho _a}}\\ {{\rho _a}}&{{\rho _a}}&1&{{\rho _a}}\\ {{\rho _a}}&{{\rho _a}}&{{\rho _a}}&1 \end{array}} \right]\]

Compound symmetry \(\simeq\) Repeated measures ANOVA

y ~ 1 + X1 + (1|group) + (1|group:X1)

Which one to choose?

1: You can estimate the full (enough data or put a lkj-prior if you are Bayes)
2: Prior assumption on shape and you don’t have to change it (non-contingent on data)
3: Simplify in a data-driven way

Ongoing debate “Keep it maximal”, Barr et al. 2013 vs. “Parsimoneous mixed models”, Bates et al. 2015

Barr promotes full models. Bates promotes a step wise reduction if effects do not significantly explain variance

Overview

Repeated measures
The mixed model
An example to guide you through the steps
Inference
Assumption checking
Shrinkage
Terminology
Convergence Problems
Multiple random variables

Multiple random variables

What a nice effect! Write paper and end of story?

But very high stimulus variance!

But very high stimulus variance

a high variance can lead to sporadic effects

You should include it in your model

Often it is necessary to add multiple random effects! Examples are: subjects, stimuli (images, words, etc)

y ~ 1 + condA (1+condA|subject) + (1+condA|stimulus)

When to include what

Some guidance:

Predictor is a subset of a population?
- => random/grouping variable e.g. subjects, images, words
Predictor is within subject/grouping?
- => random coefficients
Predictor is between subject/grouping or aggregated?
- => only fixed coefficient

Summary

We learned how to analyze repeated measures data
We learned how to structure random effects
We learned how to check assumptions of Mixed Models
We learned about shrinkage
We learned how to include multiple random variables

Mixed Models / Hierarchical Models

Benedikt Ehinger

March 22nd, 2018

Slides & my Presentation style

Overview

Repeated measures

Repeated measures

Individual differences

Accouting for repeated measures?

Sampling distributions

Multiple trials per subject

Sampling distributions

Sampling distributions

Always necessary to aggregate?

Should we trust all subjects equally?

Should we trust all subjects equally?

Should we trust all subjects equally?

Summary

Overview

The example

The data

one model for all subjects

one model for each subject

Excourse: two-stage statistics

One solution

A compromise

Overview

Formal definition of the linear mixed model

A word of warning

The mixed model in R

Mixed models in R

Overview

Model-Comparisons aka p-values

REML VS ML

Intuition:

Solution:

Bootstrapping

pbkrtest

Overview

Checking Assumptions Level I

Checking assumptions Level II

Overview

Shrinkage

Shrinkage-plot

Shrinkage over multiple parameters

The multivariate normal I

The multivariate normal II

Back to mixed models

Overview

Naming is a bit messy

Random/Fixed Coefficients

Random/Fixed Variables

Overview

A more complex model

Convergence problems

Indicator

Reasons

Your possiblities:

The ideal case

The simplest structure

Which one to choose?

Overview

Multiple random variables

But very high stimulus variance!

But very high stimulus variance

You should include it in your model

When to include what

Some guidance:

Summary

Papers & Books