MATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions

Chapter 5

Matthew

At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?

Qiong

If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term.

Carrie

A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?

This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let
$Var\left[ \begin{matrix} {{u}_{0j}} \\ {{u}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2} & {{\tau }_{01}} \\ {{\tau }_{10}} & \tau _{1}^{2} \\ \end{matrix} \right]$ for raw IQ.
If we recenter IQ with: $\tilde{\text{IQ}}=\text{IQ}-c$ then:
$\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} 1 & c \\ 0 & 1 \\ \end{matrix} \right]\left[ \begin{matrix} {{u}_{0j}} \\ {{u}_{1j}} \\ \end{matrix} \right]$
and
$Var\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} 1 & c \\ 0 & 1 \\ \end{matrix} \right]\left[ \begin{matrix} \tau _{0}^{2} & {{\tau }_{01}} \\ {{\tau }_{10}} & \tau _{1}^{2} \\ \end{matrix} \right]\left[ \begin{matrix} 1 & 0 \\ c & 1 \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\ {{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\ \end{matrix} \right]$
If $c=-{{\tau }_{01}}/\tau _{1}^{2}$, then the variance of the intercept is minimized:
$Var\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\ 0 & \tau _{1}^{2} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\ 0 & \tau _{1}^{2} \\ \end{matrix} \right]$

Gilbert

In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.

Daniela

In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?

In 'lme', multilevel nesting is handled with. e.g.
        fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)

Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:
        cvar(X,idmiddle)

or
        capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))

Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:
Function Notes
lme

in package nlme

Linear mixed effects: normal response

G side and R side modelling
Model syntax:

 Y ~ X * W, random = ~ 1 + X | id


For nested effect:

 Y ~ X * W, random = ~ 1 + X | higher/lower


or, to have different models at different levels:

 Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )

lmer

in package lme4

Linear mixed models for gaussian response with Laplace approximation

G side modeling only, R = σ2I
Model syntax:

 Y ~ X * W +(1+X|id)


For nested effect:

 Y ~ X * W +(1+X|higher) + (1+X|higher:lower)


For crossed effect:

 Y ~ X * W +(1+X|id1) + (1+X|id2)

glmer

in package lme4

• family: binomial, Gamma, inverse.gaussian, poisson, gaussian

G side only, no R side
Model syntax:

 Y ~ X * W +(1+X|id)


For nested effect:

 Y ~ X * W +(1+X|higher) + (1+X|higher:lower)


For crossed effect:

 Y ~ X * W +(1+X|id1) + (1+X|id2)

glmmPQL

in packages MASS/nlme

Generalized linear mixed models with Penalized Quasi Likelihood
• family: binomial, Gamma, inverse.gaussian, poisson, gaussian

G side and R side as in lme
Model syntax:

 Y ~ X * W, random = ~ 1 + X | id


For nested effect:

 Y ~ X * W, random = ~ 1 + X | higher/lower

MCMCglmm

in package MCMCglmm

Generalized linear mixed models with MCMC
• family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,

cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.
Model syntax:

 Y ~ X * W, random = ~ us(1 + X):id  [Note: id should be a factor, us=unstructured]


For nested effect:

 Y ~ X * W, random =  ~us(1 + X):higher + us(1 + X):higher:lower


For crossed effect:

 Y ~ X * W, random =  ~us(1 + X):id1+ us(1 + X):id2


Phil

Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.

Ryan

When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the error of excluding a non-significant but confounding variable?

This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?

Chapter 6

What is random?

At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?

• NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv

mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")
summary(mlb612a)

mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")
summary(mlb612b)


Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?

--Msigal 12:07, 13 June 2012 (EDT)

Qiong

The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?

Carrie

On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??

Gilbert

On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model?

Daniela

On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?

Phil

Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to $\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}$, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?

Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)

Ryan

In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?

Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).
With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.

Chapter 7

Explained Variance in Random Slope Models

Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.

The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial.

Is the authors' statement justifiable? Is obtaining an R2 as important/interesting in MLM as it is in other models?

Explained variance in three-level models

In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model?

Interpreting R2 as an Effect Size

A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?

Explained variance

In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?

Estimates of R2

On page 113, "it is observed that an estimated value for R2 becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in R2. Can you give an example of when this occurs based on the size of change in R2?

Predicted R2

After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, $\hat{Y}_{ij}$, to predict the observed dependent values, Yij. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '$R^2_{Predicted}$'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an $R^2_{Predicted}$ could be a useful description.

The Size and Direction of R2 Change As a Diagnostic Criteria

The suggestion has been made that changes in R2 where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the R2 estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)

Chapter 8

You can sign your contributions with --~~~~ --Georges 07:50, 14 June 2012 (EDT)

"Correlates of diversity"

Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity. --Gilbert8 11:26, 16 June 2012 (EDT)

"Modeling Heteroscedasticity"

When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --Msigal 11:26, 18 June 2012 (EDT)

The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form? Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance? --~~~~ --Qiong Li 12:25, 18 June 2012 (EDT)

On a practical note - How is this done??

It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? | S&B ExCode --Smithce 14:15, 19 June 2012 (EDT)

Variable centering

Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --Rphilip2004 13:42, 19 June 2012 (EDT)

What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --Dusvat 18:43, 18 June 2012 (EDT)

Generic regression question: Treating a factor as continuous for the interaction

On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --Smithce 14:15, 19 June 2012 (EDT)

Generalized Linear Models and Heteroscedasticiy

Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --Rbarnhar 14:24, 19 June 2012 (EDT)

Chapter 9

"Imputation"

1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.

2. Discuss an example with missing data set . Provide R code using multiple imputation to complete data set. --Gilbert8 11:38, 18 June 2012 (EDT)

Patterns of Missingness

Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --Msigal 08:47, 20 June 2012 (EDT)

If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --Smithce 09:58, 21 June 2012 (EDT)

Missingness Assumption

Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? --~~~~ --Qiong Li 12:31, 20 June 2012 (EDT)

Full Maximum Likelihood vs. Imputation

Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--Dusvat 17:27, 20 June 2012 (EDT)

Dancing on the Bay(esian)

Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --Rphilip2004 09:18, 21 June 2012 (EDT)

Don't Throw the Kitchen Sink - ICE

While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute? --Rbarnhar 09:59, 21 June 2012 (EDT)

Chapter 10

Influence of level-two units

Provide a detailed explanation of how deletion diagnostics is performed and provide a practical example to illustrate it.--Gilbert8 18:22, 24 June 2012 (EDT)

Incorporating Descriptive Statistics

One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --Msigal 10:12, 25 June 2012 (EDT)

Orders of the model checks

In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things? --~~~~ --Qiong Li 2:33, 25 June 2012 (EDT)

Assumption violations

In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--Dusvat 17:45, 25 June 2012 (EDT)

Checking for Random Slopes

In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?