MATH 6643 Summer 2012 Applications of Mixed Models
From Wiki1
Thought of the Week
“... but what did they want to learn? ...” – Nancy Reid possibly commenting on a presentation of a statistical consultation.
News
 NEW: July 10: Prepare two questions, suitable as exam questions (requiring 10 to 15 minutes to answer) on the material of the course. Try to prepare one on the earlier material and one on the later material. You can post your question in /Submitted sample exam questions. Please post them before July 17 so we can have a look at them in class.
 May 17: We decided that, from now on, instead of preparing an exercise question on the current chapter of Snijders and Bosker before each class, we would each prepare a 'discussion question' to be placed in the same file so it can easily be viewed and discussed. The file is Snijders and Bosker: Discussion Questions.
 July 2: spidadev update for OS X and Linux. For *unix users wanting to install the update, simply download http://blackwell.math.yorku.ca/R/spidadev.tar.gz to your desktop. Then, in X11 or Terminal, go to your desktop directory and run: R CMD INSTALL spidadev.tar.gz
Quick Links
 Snijders and Bosker: Discussion Questions
 Demo Code (at the bottom of the page)
 Links for the course (add abundantly)
 Statistics in the News (add to this too!)
 SPIDA 2012 home page and labs:
 R Scripts
 Power by simulation: Normal versus t with 5 dfs
Course Schedule
Tuesday Thursday May 8: Class 1
10: Class 2
15: Class 3
17: Class 4
22
No class: DLSPH/SORA/TABA workshop
24
No class: SPIDA
29
No class: SPIDA
31
No class: SPIDA
June 5
No class: SSC 2012 meetings in Guelph
7: Class 5
12: Class 6
14: Class 7
19: Class 8
21: Class 9
26: Class 10
28: Class 11
July 3: Class 12
5: Class 13
10: Class 14
12: Class 15
17: Class 16
19: Class 17
24: Class 18
26: Final exam
Things we'll do
 Read Snijders and Bosker (2012) Multilevel Analysis, 2nd. ed., Sage
 This is one of the best books I've read on the concepts and on the 'art' of mixed models. Almost every sentence is rich with meaning. The authors very rarely skip an important point. I would venture to say that it is the best book written yet on applications of mixed models. The first edition in 1999 held the title for a number of years.
 We will discuss a new chapter and the beginning of each class. Someone will be chosen at random to lead the discussion.
 Since the book doesn't have exercises, you will prepare exercises for each chapter. Before the chapter is discussed, you will prepare 'simple, straightforward' exercises. After the chapter is discussed you will prepare deeper more conceptual and challenging exercises. You can use various sources of multilevel data for the exercises.
 Conveniently, the book has 18 chapters and we have 18 classes. We will begin with chapter 2 in class 2 and continue. You should prepare a simple exercises and post it on your 'student' page before the class. Be prepared to discuss the chapter in class. Each day someone will be chosen at random to lead off the discussion with a fiveminute summary of the key points in the chapter. Post a deeper question on the same chapter before the next class.
 Write a paper. Maybe publish it.
 This will be done in teams of two or, exceptionally, three. If you wish to be 'randomly' matched to others who wish to be randomly matched let me know. The idea is stolen from Gary King (2006). Read Gary King's article to find out how to proceed. You need to find an article with data that lends itself to longitudinal or multilevel analysis. You will find many examples of King's students replication papers including their data and Rcode of the Harvard Dataverse. Choose the article before May 22 and send me an email message. During the 'break' created by SPIDA and the SSC meeting, you should work on replicating the original analysis.
 Complete a few assignments early in the course intended to cover the 'math' of multilevel models.
 Write a function or a small package in R that does something useful and that can be added to the 'spidadev' package. Some ideas. Due at class 17.
 Write a final exam.
Some Data Sources with Publication Listings
 The Longitudinal Study of American Youth ([1]).
 The University of Wisconsin's BADGIR (Better Access to Data for Global Interdisciplinary Research, email registration is easy to gain access. If you decide to use the National Survey of Families and Households data I have a reduced longitudinal dataset created just ask me (Ryan) for it, it might be useful[2]).
 The Wisconsin Longitudinal Study (easy email sign up [3]).
Grades
It would be so much more fun just to learn for its own sake! But...
 [30%] Snijders and Bosker has 18 chapters. We will discuss and do exercises for 17 of them. That's a total of 17 discussions and 34 exercises. Each discussion and exercise is graded out of 10. I will discard the 11 lowest grades of the 51. That gives you a maximum possible score of 400.
 [30%] Replication, reanalysis and paper.
 [10%] Assignments
 [15%] R function or package.
 [20%] Final exam
Who and where I am
 Georges Monette, Ph.D., P.Stat. (http://www.ssc.ca/en/accreditation/briefsummaryaccreditation)
 N626 Ross
 mailto:georges+mixed@yorku.ca (Note adding '+mixed' increases the priority of your message in my mailbox)
 http://www.math.yorku.ca/~georges
 Office hours: After class or by appointment
Who you are
Things to learn
Applying a statistical methodology requires being aware of many things:
 Some basic theory behind the method
 What assumptions the theory implies, how they fail and when they're reasonable although wrong
 How to turn substantive research questions into statistical questions
 How to formulate and explore those questions statistically
 How to avoid too common but execrable errors: e.g. interpreting main effects with interactions, interpreting lists of pvalues from regression output, interpreting multiple pvalues for dummy variables of a factor
 How to program
 How to produce informative and stimulating graphics
 Numerical savvy: avoiding X'X, using svd and qr instead
We will try to weave these themes throughout the course
Places to look
 Links for the course (please add abundantly)
Class 1: May 8
Links
 Hans Rosling Ted Talks: [4] [5] [6]
 Screen capture of Class 1 (Password protected: User: frisch Password: waugh (this information will disappear soon ... well not really  this is a wiki))
 R scripts
 Notes on Linear Algebra
 Statistical Issues
Assignment 1
Due: May 15 before class
You must work individually but you can use any written source provided you cite it. Hint: you might like to refresh your memory about the Spectral Decomposition Theorem for symmetric matrices.
 1. [10] Prove the ShermanMorrison identity (state appropriate assumptions). Note that U and V need not be square matrices.
 (A + UDV)^{ − 1} = A^{ − 1} − A^{ − 1}U(D^{ − 1} + VA^{ − 1}U)^{ − 1}VA^{ − 1}
 Hint: It might be easier to first prove a special case for (I + UV)^{ − 1} and then use basic facts about inverses and products, e.g. (AB)^{ − 1} = B^{ − 1}A^{ − 1}
 (A + UDV)^{ − 1} = A^{ − 1} − A^{ − 1}U(D^{ − 1} + VA^{ − 1}U)^{ − 1}VA^{ − 1}
 2. [10] Consider a linear regression model Y = Xβ + ε where X is a matrix whose first column consists of 1's, β = (β_{0},β_{1},...,β_{k})' and Var(ε) = σ^{2}I. Let Σ_{XX} be the variance matrix of the predictor variables and let s_{E} be the residual standard error. Find and prove an expression for
 3. [10] Let Σ be symmetric. Show that Σ is positivedefinite if and only there exists a nonsingular matrix A such that Σ = AA'.
 4. [10] Show that a symmetic matrix Σ is a variance matrix if and only if there exists a matrix A such that Σ = AA'.
 5. [10] Let A and B be square matrices such that AA' = BB' = Σ with Σ positive definite. Show that there exists an orthogonal matrix Γ such that A = BΓ.
 6. [50] Retrieve the "Arrests" data set from "library(effects)" in R. You can get information about the variables in the data set in the usual way with
> ?Arrests> install.packages('effects')
 Note that you might have to download the library first with:
 The Toronto Star has published some stories claiming that this data set reveals a pattern of discrimination in police behaviour. You have been hired by the National Post to study the data set and produce an independent opinion. Your opinion may agree, disagree or otherwise qualify the claim that this data shows a conclusive pattern of discrimination. Write a report with suitable graphs and include the details of your analysis with appropriate graphs as an appendix.
Class 2: May 10
 Slides:
 Notes:
 Notes on
 Linear transformations of random vectors: mean and variance
 Multivariate Normal Distribution
 Marginal and conditional distributions
 Visualizing the bivariate normal
 Notes on
 Screen capture of Class 2 (sorry  I forgot to turn it back on after the break)
Class 3: May 15
Assignment 2
Due June 7 before class
You must write up your own work based on your own understanding but you can do anything you want to develop your understanding.
 1. [10] A basis for that is a conjugate basis with respect to a positive definite matrix M is a sequence of vectors x_{1},x_{2},...,x_{p} in such that x'_{i}Mx_{i} = 1 and x'_{i}Mx_{j} = 0 if . Show that the columns of a nonsingular matrix A form a conjugate basis with respect to Σ^{ − 1} if Σ = AA'. Note that a conjugate basis is merely an orthogonal basis (actually, an 'orthonormal' basis) with respect to the metric defined by   x   ^{2} = x'Σ^{ − 1}x.
 2. [10] We will call a "square root" of a square matrix M any square matrix A such that M = AA'. Show that a square matrix has a square root if and only if it is a variance matrix.
 3. [10] Write a function in R that computes a square root of a variance matrix M. Use the 'eigen' function. [Bonus: 2] Get your function to give an informative error message if M does not have a square root for some reason.
 4. [10] Using the function in 3, write a multivariate normal random number generator. Write it to parallel the univariate 'rnorm'. The univariate 'rnorm' takes three arguments: n, mean and sd. Consider writing your 'rmvnorm' so the third argument, if given, must be named either 'var' or 'sd' (depending on whether the user is giving a variance or the square root of a variance as input) to avoid confusion with the univariate generator. The default could be the identity  which doesn't need to be distinguished as 'var' or as 'sd'.
 5. [10] Write a simple 'lmfit' function that calculates least squares regression coefficients using an algorithm based on the svd. Ideally, design the function so it takes a formula and a data frame as arguments, e.g. lmfit( y ~ x1 + x2, dd). You can generate the model matrix using the 'model.matrix' function and extract the response using the first column of the model.frame command.
 6. [10] Consider a variance matrix for a random vector . Verify that the Cholesky matrix is a square root of Σ.
 Show that the Cholesky matrix can be written as where β_{21} is the regression coefficient of Y_{2} on Y_{1}.
 Draw a concentration (or data) ellipse and indicate the interpretation of the vectors defined by the columns of C relative to the ellipse.
 7. [10] Show that a nonsingular variance matrix, Σ can be factored so that Σ = AA' with A an upper triangular matrix [in contrast with problem 6 where the matrix is lower triangular]. Explain the interpretation of the elements of this matrix as in question 6.
 8. [20] Generate 100 observations for three variables Y, X and Z so that in the regression of Y on both X and Z neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 is rejected at the 1% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for X and Z and appropriate confidence ellipses for their two regression coefficients. What does this example illustrate about the appropriatenes of scanning regression output for significant pvalues and concluding that nothing is happening if none of the pvalue achieve significance?
 9. [20] Generate 100 observations for three variables Y, X and Z so that in the separate simple regressions of Y on each of X and Z neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 in a multiple regression of Y on both X and Z is rejected at the 5% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for X and Z and appropriate confidence ellipses for their two regression coefficients. Explain the relationship between the ellipses and the phenomenon exhibited in this problem. What does this example illustrate about the appropriatenes of forward stepwise regression to identify a suitable model to predict Y using both X and Z?
Class 4: May 17
Class Notes
 Class notes:
 Learning statistics: The How? vs The When?
 What does adding do to your model and why. [Ask me why the addedvariable plot explains everything  almost].
 Compositional effect (aka between effect) = Contextual effect + Within effect
 The 'empty' random intercept model
 Generating data
 Estimating β_{0j} with a BLUE
 Three ways of estimating γ_{00}
 Overall mean
 Mean of means
 Inverse variance weighted mean of means (= mixed model approach)
 Back to estimating β_{0j}  with a BLUP (= empirical Bayes estimator = shrinkage estimator)
 Exchangeability: when to shrink and when not to shrink
To read
 Visualizing regression: This material is relevant for questions 7 and 8 of assignment 2. I propose to look at it quickly and to let you explore it at your leisure.
 We will continue with From Hierarchical to Mixed Models
 Screen capture of Class 4
Class 5: June 7
Class 6: June 12
 Screen capture of Class 6 (sorry I messed up with the audio again and your comments are barely audible  I quite sure I will do better on Thursday)
 /Testing fixed and random effects in R
 Script that produces likelihood surfaces
Class 7: June 14
 Screen capture of Class 7
 Two labs: Lab 1 and Lab 2, with annotated examples using R to fit mixed models.
 Hierarchical Models to Mixed Models discusses theory and application of mixed models.
Assignment 3
Due: June 21 The purpose of this assignment is to give you a chance to try out what we're learning by working on a dataset and to have the sobering experience of splithalf validation. Using the 'hsfull' data set in 'spida', choose a random sample of half the schools. To make everything replicable, first choose a random seed at random (an odd integer between 1 and 2^321), record the number and use 'set.seed' with the number just before selecting the random sample. Using the half sample, explore the role ses, Sex, Minority and Sector in their relationship with math achievement. Discuss your strategy as you go along. At some point(s) in the analysis, illustrate that you can
 Correctly test the significance of interactions and of effects with multiple degrees of freedom.
 Avoid incorrect interpretations of pvalues in regression output (you should do this at every point)
 Test hypotheses involving the random effects structure of the model
 Choose between various random effects structures
 Carry out a test involving setting up a linear hypothesis matrix with more than one row
 Estimate and plot conditional effects of a factor that interacts with ses. You should first plot an 'effect' plot showing the estimated response for each level as a function of ses and, next, graph the estimate gap with standard error bands.
 Run one of your final models on the other half of the data. Comment on differences. Create the 'gap plot' with the other half of the data and comment on the differences.
Treasure Hunt
Bonus: Find examples of good and bad reports of mixed model and add links with comments here: [7]
Effect sizes in mixed models
 An interesting message on an R mailing list: "... in psychology, reviewers will bludgeon you for an effect size..." but with some cause:
 Quoting from Roberts (2006), the APA Publication Manual (2001):

The general principle to be followed, however, is to provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship. (p. 26)
 See also the comments on effect sizes in Wilkinson and the Task Force on Statistical Inference (1999). In part:
Effect sizes. Always present effect sizes for primary outcomes. If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d). It helps to add brief comments that place these effect sizes in a practical and theoretical context.
Class 8: June 19
Splitting a Dataframe by Cluster
Phil and I wrote a function that partitions a dataframe into a training and a test set. The function (and instructions) are found here [8]. Msigal 10:39, 27 June 2012 (EDT).
Heteroskedasticity in its many forms
Here's the start of a wiki page and script discussing heteroskedasticity in its various forms.
Class 9: June 21
Class 10: June 26
Short assignment 4
Don't spend more than 90 minutes on this if you attended class on June 26. Assuming that you've read Chapter 10, if you missed class on June 26 then you might have to spend 90 + 120 = 210 minutes! If you haven't read Chapter 10 then ...
Go though Lab 2 on the SPIDA page. Compare the diagnostics in Lab 2 with those in Snijders & Bosker. Identify diagnostics suggested in Snijders and Bosker that are missing in Lab 2.
Send your 'assignment' to georges@yorku.ca. Include '6643 Assignment 4' in the subject line.
Deadline: July 3.
Longer voluntary assignment: Program some missing diagnostics.
Class 11: June 28
R scripts
 Function opt3 for finding the optimal allocation of a fixed budget for cluster samples using three levels.
 Simulation for Power.R
 Simulation using 'parallel' package (Linux and OSX)
 Simulation using 'snow' package for parallel processing on multiple cores (All platforms)
Class 12: July 3
 Screen capture of Class 12
 SPIDA Slides on Longitudinal Data Analysis [9].
Class 13: July 5
Class 14: July 10
Class 15: July 12
Class 16: July 17
Some materials on MCMC
 MCMCglmm Course Notes by Jarrod Hadfield, March 20, 2012
 MCMCglmm Tutorial
 MCMCglmm manual
 MCMCglmm Overview (published in the Journal of Statistical Software)
 Jarrod D. Hadfield (2012) MCMC Methods for MultiResponse Generalized Linear Mixed Models: The MCMCglmm R Package, Journal of Statistical Software
 Kass et. al. (1998) Markov Chain Monte Carlo in Practice: A Rountable Discussion
 Interpreting the DIC (from the BUGS project
Interesting link
Class 17: July 19
 Screen capture of Class 17
 R scripts on glmmPQL and MCMCglmm:
Generating G and R matrices in various packages
Function Notes lme
in package nlme
Linear mixed effects: normal response
G side and R side modelling
Model syntax:
Y ~ X * W, random = ~ 1 + X  id
For nested effect:
Y ~ X * W, random = ~ 1 + X  higher/lower
or
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )
lmer
in package lme4
Linear mixed models for gaussian response with Laplace approximation
G side modeling only, R = σ^{2}I
Model syntax:
Y ~ X * W +(1+Xid)
For nested effect:
Y ~ X * W +(1+Xhigher) + (1+Xhigher:lower)
For crossed effect:
Y ~ X * W +(1+Xid1) + (1+Xid2)
glmer
in package lme4
Generalized linear mixed models with adaptive Gaussian quadrature
 family: binomial, Gamma, inverse.gaussian, poisson, gaussian
G side only, no R side
Model syntax:
Y ~ X * W +(1+Xid)
For nested effect:
Y ~ X * W +(1+Xhigher) + (1+Xhigher:lower)
For crossed effect:
Y ~ X * W +(1+Xid1) + (1+Xid2)
glmmPQL
in packages MASS/nlme
Generalized linear mixed models with Penalized Quasi Likelihood
 family: binomial, Gamma, inverse.gaussian, poisson, gaussian
G side and R side as in lme
Model syntax:
Y ~ X * W, random = ~ 1 + X  id
For nested effect:
Y ~ X * W, random = ~ 1 + X  higher/lower
MCMCglmm
in package MCMCglmm
Generalized linear mixed models with MCMC
 family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zeroinflated, za=zeroaltered, hu=hurdle
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.
Model syntax:
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]
For nested effect:
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower
For crossed effect:
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2
 family: binomial, Gamma, inverse.gaussian, poisson, gaussian
Class 18: July 24
Exam: July 26
Interesting Links
 Kaggle.com: Data Analysis as a sport (thanks to Ryan for sending this)