## Comments on section 6.5: Mixed models: BLUES and BLUPS

There are only two issues in this section: one is that the BLUP involves shrinkage towards the mixed model generalized-least-squares estimate instead shrinking towards the pooled estimate. There are relationships among the the within, the pooled, the between and the mixed-model (GLS) estimates but they would be much more complex to describe. The second, a notational issue, involves the use of G to refer to two related but different matrices. In (29) it refers to the variance of random effects in a single cluster. In (31), it the 'global' G that has the former G's on its diagonal.

An easy solution that avoids awkward notation is to exploit the fact that G does not have to be constant in each cluster. In models incorporating heteroskedasticity it is explicitly modeled to vary.

Here are my suggestions for minimal modifications that I believe will make everything correct.

1) Add a subscript i to the G in (29) and to the G in second line following (29).

where $G = diag(G_1,...,G_m)\!$, $R = diag(R_1,...,R_m)\!$ letting m be the number of clusters.

[continue with the existing paragraph:] The variance ... reduces to the standard linear model.

3) To set up the BLUP as a solution to the problem we need to be interested in the within-cluster $\beta_i\!$. We need to move the reference to Zi = Xi up a bit to introduce $\beta_i\!$. The following would replace the next 3 paragraphs:

We now consider the case in which Zi = Xi and we wish to predict βi = β + ui, the vector of parameters for the ith cluster. At one extreme, we could simply ignore clusters and use the common mixed-model generalized-least-square estimate,

$\hat{\beta}^{gls} = (X^T V^{-1} X)^{-1}X^T V^{-1} y$ (32)

whose sampling variance is $Var(\hat{\beta}^{gls}) = (X^T V^{-1} X)^{-1}$. It is an unbiased predictor of βi since $E(\hat{\beta}^{gls} - \beta_i) = 0$. With moderately large m, the sampling variance may be small relative to Gi and $Var( \hat{\beta}^{gls} - \beta_i) \approx G_i$.

At the other extreme, we ignore the fact that clusters come from a common population and we calculate the separate BLUE estimate within each cluster,

$\hat{\beta}_i^{blue} = (X_i^T X_i)^{-1}X_i^T y_i$ with $Var(\hat{\beta}_i^{blue}|\beta_i) = S_i = \sigma^2 (X_i^T X_i)^{-1}$ (33)

Both extremes have drawbacks: whereas the overall GLS estimate ignores variation between clusters, the within-cluster BLUE ignores the common population and makes clusters appear to differ more than they actually do.

This dilemma led to the development of BLUPs (best linear unbiased predictor) in models with random effects (Henderson, 1975, Robinson, 1991, Speed, 1991). In the case considered here, the BLUPs are an inverse-variance weighted average of the mixed-model GLS estimate and of the BLUE. The BLUP is then:

$\tilde{\beta}_i^{blup} = (S_i^{-1} + G_i^{-1})^{-1} (S_i^{-1}\hat{\beta}_i^{blue} + G_i^{-1}\hat{\beta}^{gls})$ (34)

This "partial pooling" optimally combines the information from cluster i with the information from all clusters, shrinking $\hat{\beta}_i^{blue}$ towards $\hat{\beta}^{gls}$. Shrinkage for a given parameter βij is greater when the sample size ni is small or when the variance of the corresponding random effect, gijj, is small.

[The last paragraph: "Eqn. (34) ..... is unchanged]