# MATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions

## Chapter 5

### Matthew

At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?

### Qiong

If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term.

### Carrie

A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?

This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let
$Var\left[ \begin{matrix} {{u}_{0j}} \\ {{u}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2} & {{\tau }_{01}} \\ {{\tau }_{10}} & \tau _{1}^{2} \\ \end{matrix} \right]$ for raw IQ.
If we recenter IQ with: $\tilde{\text{IQ}}=\text{IQ}-c$ then:
$\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} 1 & c \\ 0 & 1 \\ \end{matrix} \right]\left[ \begin{matrix} {{u}_{0j}} \\ {{u}_{1j}} \\ \end{matrix} \right]$
and
$Var\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} 1 & c \\ 0 & 1 \\ \end{matrix} \right]\left[ \begin{matrix} \tau _{0}^{2} & {{\tau }_{01}} \\ {{\tau }_{10}} & \tau _{1}^{2} \\ \end{matrix} \right]\left[ \begin{matrix} 1 & 0 \\ c & 1 \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\ {{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\ \end{matrix} \right]$
If $c=-{{\tau }_{01}}/\tau _{1}^{2}$, then the variance of the intercept is minimized:
$Var\left[ \begin{matrix} {{{\tilde{u}}}_{0j}} \\ {{{\tilde{u}}}_{1j}} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\ 0 & \tau _{1}^{2} \\ \end{matrix} \right]=\left[ \begin{matrix} \tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\ 0 & \tau _{1}^{2} \\ \end{matrix} \right]$

### Gilbert

In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.

### Daniela

In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?

In 'lme', multilevel nesting is handled with. e.g.
        fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)

Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:
        cvar(X,idmiddle)

or
        capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))

Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:
Function Notes
lme

in package nlme

Linear mixed effects: normal response

G side and R side modelling
Model syntax:

 Y ~ X * W, random = ~ 1 + X | id


For nested effect:

 Y ~ X * W, random = ~ 1 + X | higher/lower


or, to have different models at different levels:

 Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )

lmer

in package lme4

Linear mixed models for gaussian response with Laplace approximation

G side modeling only, R = σ2I
Model syntax:

 Y ~ X * W +(1+X|id)


For nested effect:

 Y ~ X * W +(1+X|higher) + (1+X|higher:lower)


For crossed effect:

 Y ~ X * W +(1+X|id1) + (1+X|id2)

glmer

in package lme4

Generalized linear mixed models with adaptive Gaussian quadrature
• family: binomial, Gamma, inverse.gaussian, poisson, gaussian

G side only, no R side
Model syntax:

 Y ~ X * W +(1+X|id)


For nested effect:

 Y ~ X * W +(1+X|higher) + (1+X|higher:lower)


For crossed effect:

 Y ~ X * W +(1+X|id1) + (1+X|id2)

glmmPQL

in packages MASS/nlme

Generalized linear mixed models with Penalized Quasi Likelihood
• family: binomial, Gamma, inverse.gaussian, poisson, gaussian

G side and R side as in lme
Model syntax:

 Y ~ X * W, random = ~ 1 + X | id


For nested effect:

 Y ~ X * W, random = ~ 1 + X | higher/lower

MCMCglmm

in package MCMCglmm

Generalized linear mixed models with MCMC
• family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,

cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.
Model syntax:

 Y ~ X * W, random = ~ us(1 + X):id  [Note: id should be a factor, us=unstructured]


For nested effect:

 Y ~ X * W, random =  ~us(1 + X):higher + us(1 + X):higher:lower


For crossed effect:

 Y ~ X * W, random =  ~us(1 + X):id1+ us(1 + X):id2


### Phil

Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.

### Ryan

When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the error of excluding a non-significant but confounding variable?

This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?

## Chapter 6

### What is random?

At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?

• NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv

mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")
summary(mlb612a)

mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")
summary(mlb612b)


Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?

--Msigal 12:07, 13 June 2012 (EDT)

### Qiong

The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?

### Carrie

On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??

### Gilbert

On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model?

### Daniela

On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?

### Phil

Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to $\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}$, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?

Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)

### Ryan

In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?

Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).
With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.

## Chapter 7

### Explained Variance in Random Slope Models

Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.

The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial.

Is the authors' statement justifiable? Is obtaining an R2 as important/interesting in MLM as it is in other models?

### Explained variance in three-level models

In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model?

### Interpreting R2 as an Effect Size

A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?

### Explained variance

In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?

### Estimates of R2

On page 113, "it is observed that an estimated value for R2 becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in R2. Can you give an example of when this occurs based on the size of change in R2?

### Predicted R2

After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, $\hat{Y}_{ij}$, to predict the observed dependent values, Yij. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '$R^2_{Predicted}$'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an $R^2_{Predicted}$ could be a useful description.

### The Size and Direction of R2 Change As a Diagnostic Criteria

The suggestion has been made that changes in R2 where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the R2 estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)

## Chapter 8

You can sign your contributions with --~~~~ --Georges 07:50, 14 June 2012 (EDT)

### "Correlates of diversity"

Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity. --Gilbert8 11:26, 16 June 2012 (EDT)

### "Modeling Heteroscedasticity"

When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --Msigal 11:26, 18 June 2012 (EDT)

### Linear or quadratic variance functions

The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form? Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance? --~~~~ --Qiong Li 12:25, 18 June 2012 (EDT)

### On a practical note - How is this done??

It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? | S&B ExCode --Smithce 14:15, 19 June 2012 (EDT)

### Variable centering

Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --Rphilip2004 13:42, 19 June 2012 (EDT)

What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --Dusvat 18:43, 18 June 2012 (EDT)

### Generic regression question: Treating a factor as continuous for the interaction

On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --Smithce 14:15, 19 June 2012 (EDT)

### Generalized Linear Models and Heteroscedasticiy

Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --Rbarnhar 14:24, 19 June 2012 (EDT)

## Chapter 9

### "Imputation"

1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.

--Gilbert8 11:38, 18 June 2012 (EDT)


### Patterns of Missingness

Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --Msigal 08:47, 20 June 2012 (EDT)

If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --Smithce 09:58, 21 June 2012 (EDT)

### Missingness Assumption

Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? --~~~~ --Qiong Li 12:31, 20 June 2012 (EDT)

### Full Maximum Likelihood vs. Imputation

Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--Dusvat 17:27, 20 June 2012 (EDT)

### Dancing on the Bay(esian)

Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --Rphilip2004 09:18, 21 June 2012 (EDT)

### Don't Throw the Kitchen Sink - ICE

While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute? --Rbarnhar 09:59, 21 June 2012 (EDT)

## Chapter 10

### Influence of level-two units

Provide a detailed explanation of how deletion diagnostics is performed and provide a practical example to illustrate it.--Gilbert8 18:22, 24 June 2012 (EDT)

### Incorporating Descriptive Statistics

One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --Msigal 10:12, 25 June 2012 (EDT)

### Orders of the model checks

In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things? --~~~~ --Qiong Li 2:33, 25 June 2012 (EDT)

### Assumption violations

In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--Dusvat 17:45, 25 June 2012 (EDT)

### Checking for Random Slopes

In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?

### Cluster Size and Model Assumptions

In the practical nuts and bolts of application, one at times encounters situations where the number of clusters in a requested multilevel model does not support the testable or readily examinable assumption that the random intercepts are normally distributed. How important is it that the assumption of normally distributed intercepts holds?--Rbarnhar 01:22, 26 June 2012 (EDT)

### Model Residuals

The authors emphasize the importance of using OLS estimation for determining unbiased diagnostics. However, it may be useful to use model implied residuals such as $r = y - X \hat{\beta}$ and $r.c = y - X \hat{\beta} - Z \hat{\gamma}$. Describe how these model implied residuals can be used to evaluate influential observations at different design levels. --Rphilip2004 08:33, 26 June 2012 (EDT)

## Chapter 11

### Unequal cluster sample sizes

Usually, we choose the same number n as the sample size of the micro-units and the same number N as the sample size of the macro-units. I want to know whether we can improve the power for testing or small standard errors through choosing the different number as the sample size in different groups? --~~~~ --Qiong Li 12:11, 27 June 2012 (EDT)

Comment: The main situation where it seems obvious to me that one would consider unequal cluster sizes by design would be to estimate a correlation parameter in the R matrix. --Georges 18:31, 28 June 2012 (EDT)

### Allocating treatment to groups or individuals

The author mentions that multisite trial are difficult to implement as the reseercher has to make sure that they will be no contamination between the two treatment conditions within the same site or cluster randomized control may lead to selection bias. The author propose the pseudo-cluster randomized trial.Explain how this method is performed. Is this used often in practice? If this technique fails, are there other methods to resolve those situations? --Gilbert8 15:04, 27 June 2012 (EDT)

### Treatment Allocation, Continued

Building upon Gilbert's question, discuss the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials. Do any of these strategies match the Schizophrenia dataset that we have been working with during Lab 2? --Msigal 18:51, 27 June 2012 (EDT)

Continuing Matt's question, about the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials, can you give us an example of when to pick one over the other along with the cost of each trial is broken down by the cost function. Which trial is more costly? --Dusvat 23:17, 27 June 2012 (EDT)

### Power Expecting Missingness/Drop Out

I think an interesting addition to the power analyses presented would be if we could work in estimates of drop-out/non-response within cluster. For example, if I estimated that there was only a 80% chance that I would actually get useful/complete data from each level-1 unit (lower if using the URPP ;) How might this be built into these analyses? Are there authors who have done work in this with mixed models? Is it really worth bothering with, given that power analysis is hard enough (e.g. coming up 'guesses' for estimates) as it is? --Smithce 09:40, 28 June 2012 (EDT) ...now that I think about it, if you approximate using constant loss across all clusters rather than trying to fool around with unbalance this is pretty easy. So, never mind.

Comment: Perhaps an easy answer but nevertheless a very good point.

### An Unknown Value of the Intraclass Correlation Coefficient

The Authors acknowledge that the ICC is an unknown quantity, but suggest that for the Social Sciences the value tends to lie within 0.0 to 0.4. These two values have very different properties and this is made clear in the plot on the page following (p.189). The question is, not as easily answered as plotting them all as we can see from the graph the values follow different patterns of divergence. An assumed value can lead to very different optimal estimates, especially if one is wrong at the extremes. Are there any better ways to estimate the ICC a priori in order to avoid issues related to optimizing the sample size. --Rbarnhar 09:55, 28 June 2012 (EDT)

### Normal Distribution

Relating to a similar question in the past, how important is it that the various levels are normally distributed when computing power estimates? --Rphilip2004 10:08, 28 June 2012 (EDT)

Comment: Have a look at this attempt to simulate normal errors and errors with a t distribution with 5 degrees of freedom. One conceptual problem is the concept of effect size. The t distribution with 5 dfs has a standard deviation of about 1.22. The problem is that, with its high kurtosis, your estimate of the standard deviation will tend to be lower, not in the sense of 'expectation' but in the sense of the 'typical' standard deviation. The question, then, is whether to define 'effect size' in terms of the standard deviation for the t or in the original metric. This script uses the standard deviation of the t. A quick look suggests that there isn't much change, just a slight drop in power at higher effect sizes. --Georges 18:24, 28 June 2012 (EDT)
Reply: Thanks! Also here is the same code utilizing multiple cores, in this case 4.

## Chapter 12

### An Application of the BIC

As mentioned in this chapter, the BIC is a good indicator of model fit, which sets a penalty based upon the number of parameters in the model. It also seems easy to calculate based upon the typical summary objects from nlme and lme4. We learned in the last chapter about dealing with influential observations.

What I would like to know is: might it be appropriate to compare BICs from two models where the only difference between them is the removal of a set of observations deemed influential or problematic? Since there is no difference in the number of parameters, other measures of model fit seem inappropriate. In a practical application (working with a client), what would be the best way of approaching this situation? --Msigal 14:59, 30 June 2012 (EDT)

### Mixtures to Normal

Latent class mixture models are a non-parametric way to avoid, or lesson, the assumption of normality for the random coefficients, and can approximate any distribution as the number of classes is increased. How effective can arbitrary distributions be modeled, and should this modeling technique be used verify that the normality of the random coefficients assumptions hold? --Rphilip2004 19:14, 1 July 2012 (EDT)

### Sandwich estimators for standard errors

It is mentioned that the researcher works with a misspecified model and a few reasons are given for why they do so. The Sandwich method is used to estimate standard error. Is the method always applicable for misspecified model? If the model is not misspecified, is the Sandwich estimator still used?--Gilbert8 14:02, 2 July 2012 (EDT)

In the sense that 'all models are wrong, but some are useful', are we not always using 'misspecified models'? When S&B talk about using a misspecified model intentionally, are they referring to cases in which either through statistical tests or diagnostics, we have evidence that the model fails in some regard? --Smithce 22:15, 2 July 2012 (EDT)

### BIC vs. AIC

The authors talk about how BIC is a good indicator of model fit. However we have seen in the R code the AIC is also very close to the BIC method and similar in numbers. Though the authors don't talk about AIC,I would like to know if there is a difference when comparing models with random parts, as opposed to the simple models. Which method is better AIC or BIC? Why?--Dusvat 21:37, 2 July 2012 (EDT)

### Alternative method GEE

The authors mentioned that GEE is an alternative method to handle the multilevel data. Some people prefer GEEs because they like to have a procedure that estimating parameters in the absence of assumptions for how the coefficients vary. However, from the GLM course, I know that GEEs can used to estimate the parameters of generalized linear model with a possible unknown correlation between outcomes. Thus, can we use GEEs to handle a wild class of dataset including multilevel data? --~~~~ --Qiong Li 9:52, 2 July 2012 (EDT)