# MATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions

(Difference between revisions)
Jump to: navigation, search
 Revision as of 00:04, 14 June 2012 (view source)Georges (Talk | contribs)m (→Ryan)← Older edit Revision as of 00:08, 14 June 2012 (view source)Smithce (Talk | contribs) Newer edit → Line 200: Line 200: In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? - === Carrie === + === Interpreting $R^2$ as an Effect Size === + A client fits a multilevel model and comes up with several significant predictors.  The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size.  You compute the Level 1 R^2 and come up with a very small value, say 0.01.  Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful? + ===  Explained variance === ===  Explained variance === In  the example provided on page  110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced  as well as in the  unbalanced case. Is this always the case or it is only for this particular example? In  the example provided on page  110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced  as well as in the  unbalanced case. Is this always the case or it is only for this particular example? + === Estimates of $R^2$=== === Estimates of $R^2$=== On page 113, "it is observed that an estimated value for $R^2$ becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in $R^2$. Can you give an example of when this occurs based on the size of change in $R^2$? On page 113, "it is observed that an estimated value for $R^2$ becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in $R^2$. Can you give an example of when this occurs based on the size of change in $R^2$? - === Predicted $R^2$ === + === Predicted $R^2$ === After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, $\hat{Y}_{ij}$, to predict the observed dependent values, $Y_{ij}$. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '$R^2_{Predicted}$'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an $R^2_{Predicted}$ could be a useful description. After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, $\hat{Y}_{ij}$, to predict the observed dependent values, $Y_{ij}$. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '$R^2_{Predicted}$'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an $R^2_{Predicted}$ could be a useful description.

## Chapter 5

### Matthew

At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?

### Qiong

If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term.

### Carrie

A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?

This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let
$Var\left[ \begin{matrix} {{u}_{0j}} \\ {{u}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2} & {{\tau }_{01}} \\ {{\tau }_{10}} & \tau _{1}^{2} \\ \end{matrix} \right]$ for raw IQ.
If we recenter IQ with: $\tilde{\text{IQ}}=\text{IQ}-c$ then:
$\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} 1 & c \\ 0 & 1 \\ \end{matrix} \right]\left[ \begin{matrix} {{u}_{0j}} \\ {{u}_{1j}} \\ \end{matrix} \right]$
and
$Var\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} 1 & c \\ 0 & 1 \\ \end{matrix} \right]\left[ \begin{matrix} \tau _{0}^{2} & {{\tau }_{01}} \\ {{\tau }_{10}} & \tau _{1}^{2} \\ \end{matrix} \right]\left[ \begin{matrix} 1 & 0 \\ c & 1 \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\ {{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\ \end{matrix} \right]$
If $c=-{{\tau }_{01}}/\tau _{1}^{2}$, then the variance of the intercept is minimized:
$Var\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\ 0 & \tau _{1}^{2} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\ 0 & \tau _{1}^{2} \\ \end{matrix} \right]$

### Gilbert

In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.

### Daniela

In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?

In 'lme', multilevel nesting is handled with. e.g.
        fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)

Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:
        cvar(X,idmiddle)

or
        capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))

Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:
Function Notes
lme

in package nlme

Linear mixed effects: normal response

G side and R side modelling
Model syntax:

 Y ~ X * W, random = ~ 1 + X | id


For nested effect:

 Y ~ X * W, random = ~ 1 + X | higher/lower


or, to have different models at different levels:

 Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )

lmer

in package lme4

Linear mixed models for gaussian response with Laplace approximation

G side modeling only, R = σ2I
Model syntax:

 Y ~ X * W +(1+X|id)


For nested effect:

 Y ~ X * W +(1+X|higher) + (1+X|higher:lower)


For crossed effect:

 Y ~ X * W +(1+X|id1) + (1+X|id2)

glmer

in package lme4

Generalized linear mixed models with adaptive Gaussian quadrature
• family: binomial, Gamma, inverse.gaussian, poisson, gaussian

G side only, no R side
Model syntax:

 Y ~ X * W +(1+X|id)


For nested effect:

 Y ~ X * W +(1+X|higher) + (1+X|higher:lower)


For crossed effect:

 Y ~ X * W +(1+X|id1) + (1+X|id2)

glmmPQL

in packages MASS/nlme

Generalized linear mixed models with Penalized Quasi Likelihood
• family: binomial, Gamma, inverse.gaussian, poisson, gaussian

G side and R side as in lme
Model syntax:

 Y ~ X * W, random = ~ 1 + X | id


For nested effect:

 Y ~ X * W, random = ~ 1 + X | higher/lower

MCMCglmm

in package MCMCglmm

Generalized linear mixed models with MCMC
• family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,

cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.
Model syntax:

 Y ~ X * W, random = ~ us(1 + X):id  [Note: id should be a factor, us=unstructured]


For nested effect:

 Y ~ X * W, random =  ~us(1 + X):higher + us(1 + X):higher:lower


For crossed effect:

 Y ~ X * W, random =  ~us(1 + X):id1+ us(1 + X):id2


### Phil

Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.

### Ryan

When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the error of excluding a non-significant but confounding variable?

This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?

## Chapter 6

### What is random?

At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?

• NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv

mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")
summary(mlb612a)

mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")
summary(mlb612b)


Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?

--Msigal 12:07, 13 June 2012 (EDT)

### Qiong

The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?

### Carrie

On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??

### Gilbert

On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model?

### Daniela

On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?

### Phil

Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to $\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}$, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?

Question: Just to be sure, I presume you mean a BLUP?

### Ryan

In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?

Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).
With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.

## Chapter 7

### Explained Variance in Random Slope Models

Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.

The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial.

Is the authors' statement justifiable? Is obtaining an R2 as important/interesting in MLM as it is in other models?

### Explained variance in three-level models

In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model?

### Interpreting R2 as an Effect Size

A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?

### Explained variance

In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?

### Estimates of R2

On page 113, "it is observed that an estimated value for R2 becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in R2. Can you give an example of when this occurs based on the size of change in R2?

### Predicted R2

After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, $\hat{Y}_{ij}$, to predict the observed dependent values, Yij. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '$R^2_{Predicted}$'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an $R^2_{Predicted}$ could be a useful description.

### The Size and Direction of R2 Change As a Diagnostic Criteria

The suggestion has been made that changes in R2 where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the R2 estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)