Mixed Models / Hierarchical Models

Benedikt Ehinger
March 2017

Overview

  • Repeated measures
  • Mixed Models

Repeated measures

plot of chunk unnamed-chunk-2

It looks like there is no difference between groups?

But what if we have data from within the same subjects?

Repeated measures

plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3

Between-Subject variabilty can be removed by repeated measures.

Never analyze repeated measures data without taking repeated measures into account (Disaggregation, Pseudoreplication)

Why you have to acount for repeated measures

plot of chunk unnamed-chunk-5 Black = mean of the subject.

\( n_S=5 \) subjects

Sampling distributions

plot of chunk unnamed-chunk-6

Multiple trials per subject

plot of chunk unnamed-chunk-7 with \( n_T =15 \) trials

Sampling distributions

plot of chunk unnamed-chunk-8

This sampling distribution reflects these exact 5 subjects. But does not generalize!

Sampling distributions

plot of chunk unnamed-chunk-9

(pink for aggregate, purple for disaggregate).

There is a strong allure to choose the purple one, easier to get significant results!

\[ n_{subject}(aggregate) = df+1 = 5 \] \[ n_{"subject"}(pseudoreplication) = df+1= 5*5 = 25 \]

DONT DO THIS, YOU KNOW (now) BETTER

Always necessary to aggregate?

plot of chunk unnamed-chunk-10 \[ \sigma_{within} >> \sigma_{between} \]


Subjects are very similar

Not so necessary to aggregate first

A single parameter summarizes subject-means well

plot of chunk unnamed-chunk-11 \[ \sigma_{within} << \sigma_{between} \]

Subjects are very different

Totally necessary to aggregate first

\( n_{subject} \) parameters summarizes subject-means well

Should we trust all subjects equally?

plot of chunk unnamed-chunk-12

Question: Estimate the mean of the population

Should we trust all subjects equally?

plot of chunk unnamed-chunk-13

Question: Estimate the mean of the population

Should we trust all subjects equally?

plot of chunk unnamed-chunk-14

plot of chunk unnamed-chunk-15

We should take into account sample-size and variance into our estimate of the population mean

We should trust those subjects more, where we most sure about their central tendency

Summary

  • Why repeated measures
    • More power to find effects that exist!
  • Don't forget about repeated measures!
    • Else your type I error will be very high
  • Should we trust all subjects equally?
    • No, we should punish for small within-subject sample-size and punish high variance

Overview

  • An example to guide you through the steps
  • The mixed model
  • Inference
    • REML vs ML
  • Assumption checking
  • Shrinkage
  • Terminology
  • Convergence Problems
  • Simplifying your model

The example

EEG-Experiment: Contrast vs. P100 size

=> Very noisy subjects (babys?), between 4 and 29 trials

The data

plot of chunk unnamed-chunk-17 $$$$

one model for all subjects

plot of chunk unnamed-chunk-18 \[ \hat{y} = \beta_0 + \beta_1 * contrast \]

one model for each subject

plot of chunk unnamed-chunk-19 \[ \hat{y_j} = \beta_{0,j} + \beta_{1,j} * contrast \]

two-stage statistics

plot of chunk unnamed-chunk-21

plot of chunk unnamed-chunk-22 Note: Contrast-slope scaled by factor 10 for easier comparison

Overview

  • An example to guide you through the steps
  • The mixed model
  • Inference
    • REML vs ML
  • Assumption checking
  • Shrinkage
  • Terminology
  • Convergence Problems
  • Simplifying your model

One solution

Linear Mixed Model aka Hierarchical Model aka Multilevel Model

We estimate the population distribution directly. Subject-means with high uncertainty are less influential for the model fit

Formal definition

The mixed model in R

\[ \hat{y} = \beta_0 + \beta_1 * contrast + u_{0,j} + u_{1,j} * contrast \]

\[ \hat{y}_j = \beta X_j + b_jZ \]

Fitting this in R using the package lme4

response ~ beta + (beta|group)

In our example:

response ~ 1 + contrast + (1|subject) + (contrast|subject)

Mixed models in R

summary(lmer(p100~contrast+(1+contrast||subject),data=d))
Linear mixed model fit by REML ['lmerMod']
Formula: p100 ~ contrast + ((1 | subject) + (0 + contrast | subject))
   Data: d

REML criterion at convergence: 2271.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.71978 -0.62658 -0.01064  0.65739  2.91110 

Random effects:
 Groups    Name        Variance Std.Dev.
 subject   (Intercept) 10.16741 3.1886  
 subject.1 contrast     0.01353 0.1163  
 Residual              54.45510 7.3794  
Number of obs: 323, groups:  subject, 20

Fixed effects:
            Estimate Std. Error t value
(Intercept)  4.03422    1.10536   3.650
contrast     0.09536    0.03045   3.131

Correlation of Fixed Effects:
         (Intr)
contrast -0.324

plot of chunk unnamed-chunk-24 gray: one model, color: individual model, black: mixed model

Direct comparison of 2-stage vs mixed model

plot of chunk unnamed-chunk-25

two stage:

            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.06858    0.02474   2.772   0.0122 *

mixed-model:

            Estimate Std. Error t value
contrast     0.09526    0.02990   3.187

true: slope=0.1

Overview

  • The mixed model
  • An example to guide you through the steps
  • Inference
    • REML vs ML
  • Assumption checking
  • Shrinkage
  • Terminology
  • Convergence Problems
  • Simplifying your model

Model-Comparisons aka p-values

Unresolved problem!

Read:how to get pvalues, problems with stepwise regression, different ways to get p-values

One way (not the best*): Model comparison for fixed effects:

Be sure to keep random structure equal

anova(mres0,mres1)

"refitting model(s) with ML (instead of REML)"

Data: d
Models:
mres0: p100 ~ 1 + (1 + contrast | subject)
mres1: p100 ~ 1 + contrast + (1 + contrast | subject)
      Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)   
mres0  5 2286.5 2305.3 -1138.2   2276.5                            
mres1  6 2280.1 2302.7 -1134.0   2268.1 8.4095      1   0.003733 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • currently bootstraping or mcmc recommended

REML VS ML


Intuition:

When estimating the variance parameter using maximum likelihood (ML) \( \sigma \) is biased because \( \mu \) is not known at the same time

Solution:

Fit all fixed effects \( \mu \), then refit the variance parameters on the residing estimates.

This is called restricted maximum likelihood (REML). If \( n \) is large, no difference between ML and REML exist. If \( n \) is small REML gives better predictive results.


If you do model comparison and fixed effects differ (usual case) you have to use ML

Excourse: Bootstraping

  • resample from the data with replacement (or parametrically from \( H_0 \)-Model) 1000 times
  • calculate statistic of interest
  • observe spread of statistic
b1 <- bootMer(mres1, FUN = function(x) as.numeric(logLik(x)), nsim = nboot,seed = 1)
b2 <- bootMer(mres0, FUN = function(x) as.numeric(logLik(x)), nsim = nboot,seed = 1)
bootLL =  2 * b1$t - 2 * b2$t # grab the values & plot
quantile(bootLL, probs = c(0.025, 0.975))
     2.5%     97.5% 
-7.358778  1.979873 

plot of chunk unnamed-chunk-32

pbkrtest

Better & easier: Likelihood-Ratio using R-Package pbkrtest

summary(pbkrtest::PBmodcomp(mres1,mres0,nsim=nboot))
Parametric bootstrap test; time: 112.23 sec; samples: 200 extremes: 0;
large : p100 ~ 1 + contrast + (1 + contrast | subject)
small : p100 ~ 1 + (1 + contrast | subject)
           stat     df    ddf  p.value   
PBtest   8.3917               0.004975 **
Gamma    8.3917               0.005508 **
Bartlett 7.4900 1.0000        0.006204 **
F        8.3917 1.0000 18.614 0.009379 **
LRT      8.3917 1.0000        0.003769 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The main con against bootstraping is the high computational time (use at least 1000 not 200!)

Overview

  • The mixed model
  • An example to guide you through the steps
  • Inference
    • REML vs ML
  • Assumption checking
    • QQplost
    • Level 1
    • Level 2
  • Shrinkage
  • Terminology
  • Convergence Problems
  • Simplifying your model

Checking Assumptions: Normality using QQplots

plot of chunk unnamed-chunk-34

Source: Sean Kross

plot of chunk unnamed-chunk-35

plot of chunk unnamed-chunk-36

plot of chunk unnamed-chunk-37

Checking Assumptions Level I

plot of chunk unnamed-chunk-39

No strong non-linearity visible in residuals

plot of chunk unnamed-chunk-40

qqplot indicates normality of residuals

Checking assumptions Level II

In addition: test whether normality for random-variable is true: plot of chunk unnamed-chunk-41plot of chunk unnamed-chunk-41

QQ-Plots indicate normality of both random effects.

Overview

  • The mixed model
  • An example to guide you through the steps
  • Inference
    • REML vs ML
  • Assumption checking
  • Shrinkage
  • Terminology
  • Convergence Problems
  • Simplifying your model

Shrinkage

The individual estimates with high variance / far off the population are shrinked inwards

plot of chunk unnamed-chunk-42 gray: one model, color: individual model, black: mixed model

Shrinkage-plot

plot of chunk unnamed-chunk-43

Shrinkage over multiple parameters

plot of chunk unnamed-chunk-44 Simulated data: Subjects with faster reaction time (intercept) show larger effects for difficulty (slope)

In reverse: If I know somebody is generally slow, he should also have a larger slope!

A multivariate normal

\( \rho = 0 \), knowing one dimension does not tell you about the other

\[ \sum = \left[ {\begin{array}{*{20}{c}} 1&0 \\ 0 &1 \end{array}} \right] \]

\( \rho = 0.6 \), knowing one dimension tells you something about the other

\[ \sum = \left[ {\begin{array}{*{20}{c}} 1&0.6 \\ 0.6 &1 \end{array}} \right] \]

Back to mixed models

\[ \sum = \left[ {\begin{array}{*{20}{c}} 1&\rho \\ \rho &1 \end{array}} \right] \]

plot of chunk unnamed-chunk-45

Overview

  • The mixed model
  • An example to guide you through the steps
  • Inference
    • REML vs ML
  • Assumption checking
  • Shrinkage
  • Terminology
  • Convergence Problems
  • Simplifying your model

Naming is a bit messy

Random/Fixed Coefficients

  • Random intercept
    • (1|subject): Different offset for each subject
  • Random slope
    • (predictor|subject): Different predictor-slope for each subject
  • Random correlation
    • (1+predictor|subject) or (X+Y|subject): Knowing about X tells you something about Y
  • Fixed coefficient
    • exactly the same for all subjects

Random/Fixed Variables

  • Random variable
    • Instance \( \in \) population, e.g. subjects, images, words
  • Fixed variable
    • limited number of possible values or no generalization to population needed, e.g. 3 different experimental conditions

Random effect also used for either random coefficient or random variable

Fixed effect usually refers to the estimator of the population mean

source and another 5 different definitions here

A more complex model

d$contrastSquare = d$contrast^2
lmer(p100~1+contrast+contrastSquare+(1+contrast+contrastSquare|subject),data = d)
Linear mixed model fit by REML ['lmerMod']
Formula: 
p100 ~ 1 + contrast + contrastSquare + (1 + contrast + contrastSquare |  
    subject)
   Data: d
REML criterion at convergence: 2327.55
Random effects:
 Groups   Name           Std.Dev. Corr       
 subject  (Intercept)    3.945982            
          contrast       1.070810  0.46      
          contrastSquare 0.009451 -0.46 -1.00
 Residual                7.183229            
Number of obs: 323, groups:  subject, 20
Fixed Effects:
   (Intercept)        contrast  contrastSquare  
     4.2065113       0.1213563      -0.0002988  
fit warnings:
Some predictor variables are on very different scales: consider rescaling
convergence code 0; 2 optimizer warnings; 0 lme4 warnings 


Warning messages:
1: Some predictor variables are on very different scales: consider rescaling 
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 2 negative eigenvalues

Overview

  • The mixed model
  • An example to guide you through the steps
  • Inference
    • REML vs ML
  • Assumption checking
  • Shrinkage
  • Terminology
  • Convergence Problems
  • Simplifying your model

Convergence problems

Indicator

  • Convergence warning (always a problem)
  • random effect correlation is either 0 or 1 (very seldom correct)
  • model takes very long time to fit (could be correct though)

Reasons

  • misspecified or overspecified models (specified fixed variable as random variable?)
  • too many parameters and not enough data,
  • data have big outliers/are noisy

Your possiblities:

  • clean data
  • center predictors (remove mean) and scale (divide by SD of independent variable)
  • use Bayesian fit and put a soft prior on the covariance structure (MCMCglmm,rstanarm,brms)
  • simplify the model (parsimoneous models; Bates 2015)

Convergence problems

Indicator

  • Convergence warning (always a problem)
  • random effect correlation is either 0 or 1 (very seldom correct)
  • model takes very long time to fit (could be correct though)

Reasons

  • misspecified or overspecified models (specified fixed variable as random variable?)
  • too many parameters and not enough data,
  • data have big outliers/are noisy

Your possiblities:

  • clean data
  • center predictors (remove mean) and scale (divide by SD of independent variable)
  • use Bayesian fit and put a soft prior on the covariance structure (MCMCglmm,rstanarm,brms)
  • simplify the model (parsimoneous models; Bates 2015)

The ideal case

\[ \sum = \left[ {\begin{array}{*{20}{c}} 1&{{\rho_{ab}}}&{{\rho_{ac}}}&{{\rho_{ad}}}\\ {{\rho_{ab}}}&1&{{\rho_{bc}}}&{{\rho_{bd}}}\\ {{\rho_{ac}}}&{{\rho_{bc}}}&1&{{\rho_{cd}}}\\ {{\rho_{ad}}}&{{\rho_{bd}}}&{{\rho_{cd}}}&1 \end{array}} \right] \]

y ~ 1 + A + B + C + (1 + A + B + C | group)

If you have enough data, try to estimate all correlations.

The simplest structure

\[ \Sigma = \left[ {\begin{array}{*{20}{c}} 1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1 \end{array}} \right] \]

y ~ 1 + X1 + X2 + X3 + (1|group) + (X1|group) + (X2|group) + (X3|group)

shorthand:

y ~ 1 + X1 + X2 + X3 + (1+X1+X2+X3||group)

note the ||

\[ \Sigma = \left[ {\begin{array}{*{20}{c}} 1&{{\rho _a}}&{{\rho _a}}&{{\rho _a}}\\ {{\rho _a}}&1&{{\rho _a}}&{{\rho _a}}\\ {{\rho _a}}&{{\rho _a}}&1&{{\rho _a}}\\ {{\rho _a}}&{{\rho _a}}&{{\rho _a}}&1 \end{array}} \right] \]

Compound symmtry \( \simeq \) Repeated measures ANOVA

y ~ 1 + X1 + (1|group) + (1|group:X1)

Which one to choose?

  • #1 Best: You can estimate the full (enough data or put a lkj-prior)
  • #2 Best: Prior assumption on shape and you don't have to change it (non-contingent on data)
  • #3 Ok: Simplify in a data-driven way

Ongoing debate “Keep it maximal”, Barr et al. 2013 vs. “Parsimoneous mixed models”, Bates et al. 2015

Barr promotes full models. Bates promotes a stepwise reduction if effects do not significantly explain variance

A word of warning

encouraging psycholinguists to use linear mixed-effects models 'was like giving shotguns to toddlers' (Barr)


“Because mixed models are more complex and more flexible than the general linear model, the potential for confusion and errors is higher.” (Hamer

Many ongoing riddles:

Book:Richly Parameterized Linear Models

  • Chp. 9: Puzzles from Analyzing Real Datasets
  • Chp.10: A Random Effect Competing with a Fixed Effect
  • Chp.12: Competition between Random Effects
  • Chp.14: Mysterious, Inconvenient or Wrong Results from Real Datasets

Multiple random variables

plot of chunk unnamed-chunk-49

Formula: y ~ condition + (1 + condition | subject)

 Random effects:
  Groups   Name        Variance Std.Dev. Corr
  subject  (Intercept)   8.832   2.972       
           condition     1.450   1.204   0.07

 Fixed effects:
             Estimate Std. Error t value
 (Intercept)  -0.7093     0.6159  -1.152
 condition     4.0558     0.4644   8.733

What a nice effect! Write paper and end of story?

But very high stimulus variance!

plot of chunk unnamed-chunk-50

a high variance can lead to sporadic effects

You should include it in your model

plot of chunk unnamed-chunk-51

Formula: y ~ condition + (1 + condition | subject) + (1 | stimulus)

 Random effects:
  Groups   Name        Variance Std.Dev. Corr 
  subject  (Intercept)   8.870   2.978        
           condition     1.458   1.208   -0.04
  stimulus (Intercept) 126.563  11.250        

 Fixed effects:
             Estimate Std. Error t value
 (Intercept)  -0.6771     2.9562  -0.229
 condition     4.7118     4.1153   1.145

Often it is necessary to add multiple random effects! Examples are: subjects, stimuli (images, words, etc)

When to include what

Some guidance:

  • Predictor is a subset of a population?
    • => random/grouping variable e.g. subjects, images, words
  • Predictor is within subject/grouping?
    • => random coefficients
  • Predictor is between subject/grouping or aggregated?
    • => only fixed coefficient

Summary

  • We learned how to analyze repeated measures data
  • We learned how to structure random effects
  • We learned how to check assumptions of Mixed Models
  • We learned about shrinkage
  • We learned how to include multiple random variables

Papers & Books

JS-Slide

#<script type='text/javascript' src='/net/store/nbp/users/behinger/tools/rstudio/resources/presentation/revealjs/plugin/notes/notes.js'></script>
cat("


<script type='text/javascript' src='C:/Program Files/RStudio/resources/presentation/revealjs/plugin/notes/notes.js'></script>
")