http://scs.math.yorku.ca/index.php?title=Special:Contributions/Smithce&feed=atom&limit=50&target=Smithce&year=&month=Wiki1 - User contributions [en]2020-01-20T18:02:50ZFrom Wiki1MediaWiki 1.16.1http://scs.math.yorku.ca/index.php/SCS_Reads_NominationsSCS Reads Nominations2013-06-28T15:47:41Z<p>Smithce: </p>
<hr />
<div>==SCS Reads Nominations, 2013 ==<br />
== '''JUNE. 28: NEW ONLINE POLL: [https://docs.google.com/forms/d/1oBf-VR5pcUUB38wsCxUH24QFnH0qPoCxkjhYSwq05l4/viewform PLEASE VOTE HERE]''' ==<br />
<br />
*Murnane and Willett, 2012, Methods Matter<br />
*:Here is a link which will allow you to see the Table of Contents, etc. http://books.google.ca/books/about/Methods_Matter_Improving_Causal_Inferenc.html?id=lA0qSsQk_AgC&redir_esc=y<br />
<br />
* ''The BUGS Book: A Practical Introduction to Bayesian Analysis'' by David Lunn, Christopher Jackson, Nicky Best, Andrew Thomas and David Spiegelhalter, CRC Press/Chapman and Hall, 2012.<br />
*:BUGS stands for "Bayesian inference Using Gibbs Sampling." We recall using BUGS open source software when we read together Andrew Gelman and Jennifer Hill, ''Data Analysis Using Regression and Multilevel/Hierarchical Models'', 2007.<br />
*:[http://www.crcpress.com/product/isbn/9781584888499 ''Book info at the publisher’s (CRC Press) website'']<br />
*:[http://www.mrc-bsu.cam.ac.uk/bugs/thebugsbook/ ''Book info at the BUGS Project website'']<br />
*:[http://www.amazon.com/The-BUGS-Book-Introduction-Statistical/dp/1584888490/ref=sr_1_1?ie=UTF8&S&qid=1341934995&sr=8-1#reader_1584888490 ''‘Look Inside’ the book at Amazon.com'']<br />
<br />
*Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences<br />
*:Linda M. Collins, Stephanie T. Lanza, Wiley, 2010<br />
*: Here is a link to the book: http://www.amazon.ca/Latent-Class-Transition-Analysis-Applications/dp/0470228393<br />
<br />
* [http://bayes.cs.ucla.edu/BOOK-2K/ Judea Pearl (2009) ''Causality'' 2nd ed., Cambridge University Press]<br />
*: Some other papers by Pearl: [http://ftp.cs.ucla.edu/pub/stat_ser/r355.pdf] [http://escholarship.org/uc/item/1sv6k47m#page-1] [http://www.degruyter.com/view/j/jci.2013.1.issue-1/jci-2013-0003/jci-2013-0003.xml?format=INT]<br />
<br />
== SCS Reads Nominations, 2012 ==<br />
*[[SCS Reads 2012]]<br />
<br />
== '''SEPT. 11: NEW ONLINE POLL: PLEASE VOTE HERE -- [http://www.surveymonkey.com/s/R87G886]''' ==<br />
<br />
* 'Little Green Books' from Sage. <br />
*:You can see the list at http://srmo.sagepub.com/browse?doctype=qass. The idea would be to select a few books to read over the year, covering various topics. (This was nominated by Matt and Carrie if I remember correctly)<br />
<br />
* Jim Albert (2009) [http://bayes.bgsu.edu/bcwr/ ''Bayesian Computation with R''], 2nd ed., Springer<br />
*: A relatively small book in the ''Use R!'' series. What seems nice about this book is that it would provide an introduction to Bayesian analysis, MCMC, Gibbs sampling, convergence diagnostics, etc. in a context in which we can learn the methods by using them. We could supplement the book with other readings on Bayesian methods. Or, since it is relatively short we could devote time to other topics. (nominated by Georges)<br />
<br />
*[http://www.amazon.com/Structural-Equation-Modeling-Concepts-Applications/dp/0803953186 ''Structural Equation Modeling: Concepts, Issues, and Applications''] by Rick Hoyle (nominated by Constance)<br />
*: Each chapter is written by a different contributor (e.g., Bentler and Hu)<br />
*: We haven't done a SEM book since I've been at York (so at least 4 years). Given its popularity among applied researchers, would be great to have some discussion of this modeling technique! We could supplement with a few more recent papers on SEM as well.<br />
<br />
*[http://www.amazon.com/Introduction-Statistical-Mediation-Multivariate-Applications/dp/0805864296 ''Introduction to Statistical Mediation Analysis''] by David MacKinnon (nominated by Constance)<br />
*: It could be fascinating to do this and correlate the approach with<br />
*:* the, presumably Rubinesque, approach of the [http://cran.r-project.org/web/packages/mediation/index.html mediation] package in R. [GM]<br />
*:* the graphical model approach, e.g. [http://ftp.cs.ucla.edu/pub/stat_ser/r363.pdf Pearl, J. (2011) "The Mediation Formula: A guide to the assessment of causal pathways in nonlinear models"]<br />
<br />
*[http://www.crcpress.com/product/isbn/9781439813263 ''Multivariate Generalized Linear Mixed Models Using R''] by Damon M. Berridge and Robert Crouchley. CRC Press, 2011. (nominated by Hugh)<br />
*: The extension of Generalized Linear Mixed Models to handle multivariate dependent variables is, of course, a very valuable addition to our tools for multi-level modelling. This book uses the SabreR package in R.<br />
<br />
== SCS Reads Nominations, 2011 ==<br />
* [[Archived_list_of_books_considered|Books nominated last year]] (feel free to renominate!)<br />
* Judea Pearl (2010) "The Foundations of Causal Inference" ''Sociological Methodology'' 40, 75-149. Also vailable as a [http://ftp.cs.ucla.edu/pub/stat_ser/r355.pdf technical report] (suggested by --[[User:Georges|Georges]] 20:03, 3 April 2011 (EDT)).<br />
*: This extensive article discusses the SEM graphical model approach to causality as well as the counterfactual approach contrasting them with a preference for the former. The article might merit a few months of seminars. We could invite speakers like Helene Massam to discuss some portions of it. Pearl's generous acknowledgement to a reviewer on p. 79 is worth noting.<br />
** Also: [http://www.bepress.com/cgi/viewcontent.cgi?article=1322&context=ijb Judea Pearl (2011) "Principal Stratification — a Goal or a Tool?" ''The International Journal of Biostatistics'' 7, 1-13.]<br />
** Added by Dave Flora:<br />
***Keith A. Markus (2010): Structural Equations and Causal Explanations: Some Challenges for Causal SEM, Structural Equation Modeling: A Multidisciplinary Journal, 17:4, 654-676 http://dx.doi.org/10.1080/10705511.2010.510068<br />
***Donna L. Coffman (2011): Estimating Causal Effects in Mediation Analysis Using Propensity Scores, Structural Equation Modeling: A Multidisciplinary Journal, 18:3, 357-369 To link to this article: http://dx.doi.org/10.1080/10705511.2011.582001</div>Smithcehttp://scs.math.yorku.ca/index.php/SCS_Reads_NominationsSCS Reads Nominations2013-06-28T15:47:23Z<p>Smithce: </p>
<hr />
<div>==SCS Reads Nominations, 2013 ==<br />
== '''JUNE. 28: NEW ONLINE POLL: [https://docs.google.com/forms/d/1oBf-VR5pcUUB38wsCxUH24QFnH0qPoCxkjhYSwq05l4/viewform "PLEASE VOTE HERE"]''' ==<br />
<br />
*Murnane and Willett, 2012, Methods Matter<br />
*:Here is a link which will allow you to see the Table of Contents, etc. http://books.google.ca/books/about/Methods_Matter_Improving_Causal_Inferenc.html?id=lA0qSsQk_AgC&redir_esc=y<br />
<br />
* ''The BUGS Book: A Practical Introduction to Bayesian Analysis'' by David Lunn, Christopher Jackson, Nicky Best, Andrew Thomas and David Spiegelhalter, CRC Press/Chapman and Hall, 2012.<br />
*:BUGS stands for "Bayesian inference Using Gibbs Sampling." We recall using BUGS open source software when we read together Andrew Gelman and Jennifer Hill, ''Data Analysis Using Regression and Multilevel/Hierarchical Models'', 2007.<br />
*:[http://www.crcpress.com/product/isbn/9781584888499 ''Book info at the publisher’s (CRC Press) website'']<br />
*:[http://www.mrc-bsu.cam.ac.uk/bugs/thebugsbook/ ''Book info at the BUGS Project website'']<br />
*:[http://www.amazon.com/The-BUGS-Book-Introduction-Statistical/dp/1584888490/ref=sr_1_1?ie=UTF8&S&qid=1341934995&sr=8-1#reader_1584888490 ''‘Look Inside’ the book at Amazon.com'']<br />
<br />
*Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences<br />
*:Linda M. Collins, Stephanie T. Lanza, Wiley, 2010<br />
*: Here is a link to the book: http://www.amazon.ca/Latent-Class-Transition-Analysis-Applications/dp/0470228393<br />
<br />
* [http://bayes.cs.ucla.edu/BOOK-2K/ Judea Pearl (2009) ''Causality'' 2nd ed., Cambridge University Press]<br />
*: Some other papers by Pearl: [http://ftp.cs.ucla.edu/pub/stat_ser/r355.pdf] [http://escholarship.org/uc/item/1sv6k47m#page-1] [http://www.degruyter.com/view/j/jci.2013.1.issue-1/jci-2013-0003/jci-2013-0003.xml?format=INT]<br />
<br />
== SCS Reads Nominations, 2012 ==<br />
*[[SCS Reads 2012]]<br />
<br />
== '''SEPT. 11: NEW ONLINE POLL: PLEASE VOTE HERE -- [http://www.surveymonkey.com/s/R87G886]''' ==<br />
<br />
* 'Little Green Books' from Sage. <br />
*:You can see the list at http://srmo.sagepub.com/browse?doctype=qass. The idea would be to select a few books to read over the year, covering various topics. (This was nominated by Matt and Carrie if I remember correctly)<br />
<br />
* Jim Albert (2009) [http://bayes.bgsu.edu/bcwr/ ''Bayesian Computation with R''], 2nd ed., Springer<br />
*: A relatively small book in the ''Use R!'' series. What seems nice about this book is that it would provide an introduction to Bayesian analysis, MCMC, Gibbs sampling, convergence diagnostics, etc. in a context in which we can learn the methods by using them. We could supplement the book with other readings on Bayesian methods. Or, since it is relatively short we could devote time to other topics. (nominated by Georges)<br />
<br />
*[http://www.amazon.com/Structural-Equation-Modeling-Concepts-Applications/dp/0803953186 ''Structural Equation Modeling: Concepts, Issues, and Applications''] by Rick Hoyle (nominated by Constance)<br />
*: Each chapter is written by a different contributor (e.g., Bentler and Hu)<br />
*: We haven't done a SEM book since I've been at York (so at least 4 years). Given its popularity among applied researchers, would be great to have some discussion of this modeling technique! We could supplement with a few more recent papers on SEM as well.<br />
<br />
*[http://www.amazon.com/Introduction-Statistical-Mediation-Multivariate-Applications/dp/0805864296 ''Introduction to Statistical Mediation Analysis''] by David MacKinnon (nominated by Constance)<br />
*: It could be fascinating to do this and correlate the approach with<br />
*:* the, presumably Rubinesque, approach of the [http://cran.r-project.org/web/packages/mediation/index.html mediation] package in R. [GM]<br />
*:* the graphical model approach, e.g. [http://ftp.cs.ucla.edu/pub/stat_ser/r363.pdf Pearl, J. (2011) "The Mediation Formula: A guide to the assessment of causal pathways in nonlinear models"]<br />
<br />
*[http://www.crcpress.com/product/isbn/9781439813263 ''Multivariate Generalized Linear Mixed Models Using R''] by Damon M. Berridge and Robert Crouchley. CRC Press, 2011. (nominated by Hugh)<br />
*: The extension of Generalized Linear Mixed Models to handle multivariate dependent variables is, of course, a very valuable addition to our tools for multi-level modelling. This book uses the SabreR package in R.<br />
<br />
== SCS Reads Nominations, 2011 ==<br />
* [[Archived_list_of_books_considered|Books nominated last year]] (feel free to renominate!)<br />
* Judea Pearl (2010) "The Foundations of Causal Inference" ''Sociological Methodology'' 40, 75-149. Also vailable as a [http://ftp.cs.ucla.edu/pub/stat_ser/r355.pdf technical report] (suggested by --[[User:Georges|Georges]] 20:03, 3 April 2011 (EDT)).<br />
*: This extensive article discusses the SEM graphical model approach to causality as well as the counterfactual approach contrasting them with a preference for the former. The article might merit a few months of seminars. We could invite speakers like Helene Massam to discuss some portions of it. Pearl's generous acknowledgement to a reviewer on p. 79 is worth noting.<br />
** Also: [http://www.bepress.com/cgi/viewcontent.cgi?article=1322&context=ijb Judea Pearl (2011) "Principal Stratification — a Goal or a Tool?" ''The International Journal of Biostatistics'' 7, 1-13.]<br />
** Added by Dave Flora:<br />
***Keith A. Markus (2010): Structural Equations and Causal Explanations: Some Challenges for Causal SEM, Structural Equation Modeling: A Multidisciplinary Journal, 17:4, 654-676 http://dx.doi.org/10.1080/10705511.2010.510068<br />
***Donna L. Coffman (2011): Estimating Causal Effects in Mediation Analysis Using Propensity Scores, Structural Equation Modeling: A Multidisciplinary Journal, 18:3, 357-369 To link to this article: http://dx.doi.org/10.1080/10705511.2011.582001</div>Smithcehttp://scs.math.yorku.ca/index.php/SCS_Reads_NominationsSCS Reads Nominations2013-06-28T15:45:49Z<p>Smithce: </p>
<hr />
<div>==SCS Reads Nominations, 2013 ==<br />
== '''JUNE. 28: NEW ONLINE POLL: PLEASE VOTE HERE -- [https://docs.google.com/forms/d/1oBf-VR5pcUUB38wsCxUH24QFnH0qPoCxkjhYSwq05l4/viewform]''' ==<br />
<br />
*Murnane and Willett, 2012, Methods Matter<br />
*:Here is a link which will allow you to see the Table of Contents, etc. http://books.google.ca/books/about/Methods_Matter_Improving_Causal_Inferenc.html?id=lA0qSsQk_AgC&redir_esc=y<br />
<br />
* ''The BUGS Book: A Practical Introduction to Bayesian Analysis'' by David Lunn, Christopher Jackson, Nicky Best, Andrew Thomas and David Spiegelhalter, CRC Press/Chapman and Hall, 2012.<br />
*:BUGS stands for "Bayesian inference Using Gibbs Sampling." We recall using BUGS open source software when we read together Andrew Gelman and Jennifer Hill, ''Data Analysis Using Regression and Multilevel/Hierarchical Models'', 2007.<br />
*:[http://www.crcpress.com/product/isbn/9781584888499 ''Book info at the publisher’s (CRC Press) website'']<br />
*:[http://www.mrc-bsu.cam.ac.uk/bugs/thebugsbook/ ''Book info at the BUGS Project website'']<br />
*:[http://www.amazon.com/The-BUGS-Book-Introduction-Statistical/dp/1584888490/ref=sr_1_1?ie=UTF8&S&qid=1341934995&sr=8-1#reader_1584888490 ''‘Look Inside’ the book at Amazon.com'']<br />
<br />
*Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences<br />
*:Linda M. Collins, Stephanie T. Lanza, Wiley, 2010<br />
*: Here is a link to the book: http://www.amazon.ca/Latent-Class-Transition-Analysis-Applications/dp/0470228393<br />
<br />
* [http://bayes.cs.ucla.edu/BOOK-2K/ Judea Pearl (2009) ''Causality'' 2nd ed., Cambridge University Press]<br />
*: Some other papers by Pearl: [http://ftp.cs.ucla.edu/pub/stat_ser/r355.pdf] [http://escholarship.org/uc/item/1sv6k47m#page-1] [http://www.degruyter.com/view/j/jci.2013.1.issue-1/jci-2013-0003/jci-2013-0003.xml?format=INT]<br />
<br />
== SCS Reads Nominations, 2012 ==<br />
*[[SCS Reads 2012]]<br />
<br />
== '''SEPT. 11: NEW ONLINE POLL: PLEASE VOTE HERE -- [http://www.surveymonkey.com/s/R87G886]''' ==<br />
<br />
* 'Little Green Books' from Sage. <br />
*:You can see the list at http://srmo.sagepub.com/browse?doctype=qass. The idea would be to select a few books to read over the year, covering various topics. (This was nominated by Matt and Carrie if I remember correctly)<br />
<br />
* Jim Albert (2009) [http://bayes.bgsu.edu/bcwr/ ''Bayesian Computation with R''], 2nd ed., Springer<br />
*: A relatively small book in the ''Use R!'' series. What seems nice about this book is that it would provide an introduction to Bayesian analysis, MCMC, Gibbs sampling, convergence diagnostics, etc. in a context in which we can learn the methods by using them. We could supplement the book with other readings on Bayesian methods. Or, since it is relatively short we could devote time to other topics. (nominated by Georges)<br />
<br />
*[http://www.amazon.com/Structural-Equation-Modeling-Concepts-Applications/dp/0803953186 ''Structural Equation Modeling: Concepts, Issues, and Applications''] by Rick Hoyle (nominated by Constance)<br />
*: Each chapter is written by a different contributor (e.g., Bentler and Hu)<br />
*: We haven't done a SEM book since I've been at York (so at least 4 years). Given its popularity among applied researchers, would be great to have some discussion of this modeling technique! We could supplement with a few more recent papers on SEM as well.<br />
<br />
*[http://www.amazon.com/Introduction-Statistical-Mediation-Multivariate-Applications/dp/0805864296 ''Introduction to Statistical Mediation Analysis''] by David MacKinnon (nominated by Constance)<br />
*: It could be fascinating to do this and correlate the approach with<br />
*:* the, presumably Rubinesque, approach of the [http://cran.r-project.org/web/packages/mediation/index.html mediation] package in R. [GM]<br />
*:* the graphical model approach, e.g. [http://ftp.cs.ucla.edu/pub/stat_ser/r363.pdf Pearl, J. (2011) "The Mediation Formula: A guide to the assessment of causal pathways in nonlinear models"]<br />
<br />
*[http://www.crcpress.com/product/isbn/9781439813263 ''Multivariate Generalized Linear Mixed Models Using R''] by Damon M. Berridge and Robert Crouchley. CRC Press, 2011. (nominated by Hugh)<br />
*: The extension of Generalized Linear Mixed Models to handle multivariate dependent variables is, of course, a very valuable addition to our tools for multi-level modelling. This book uses the SabreR package in R.<br />
<br />
== SCS Reads Nominations, 2011 ==<br />
* [[Archived_list_of_books_considered|Books nominated last year]] (feel free to renominate!)<br />
* Judea Pearl (2010) "The Foundations of Causal Inference" ''Sociological Methodology'' 40, 75-149. Also vailable as a [http://ftp.cs.ucla.edu/pub/stat_ser/r355.pdf technical report] (suggested by --[[User:Georges|Georges]] 20:03, 3 April 2011 (EDT)).<br />
*: This extensive article discusses the SEM graphical model approach to causality as well as the counterfactual approach contrasting them with a preference for the former. The article might merit a few months of seminars. We could invite speakers like Helene Massam to discuss some portions of it. Pearl's generous acknowledgement to a reviewer on p. 79 is worth noting.<br />
** Also: [http://www.bepress.com/cgi/viewcontent.cgi?article=1322&context=ijb Judea Pearl (2011) "Principal Stratification — a Goal or a Tool?" ''The International Journal of Biostatistics'' 7, 1-13.]<br />
** Added by Dave Flora:<br />
***Keith A. Markus (2010): Structural Equations and Causal Explanations: Some Challenges for Causal SEM, Structural Equation Modeling: A Multidisciplinary Journal, 17:4, 654-676 http://dx.doi.org/10.1080/10705511.2010.510068<br />
***Donna L. Coffman (2011): Estimating Causal Effects in Mediation Analysis Using Propensity Scores, Structural Equation Modeling: A Multidisciplinary Journal, 18:3, 357-369 To link to this article: http://dx.doi.org/10.1080/10705511.2011.582001</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithceMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce2012-08-20T04:45:06Z<p>Smithce: </p>
<hr />
<div>==About Me==<br />
I am a PhD student in Psychology in the Quantitative Methods Area. I completed my Masters in Psychology degree at York studying depth perception and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background! This year I had the privilege of working as a consultant with the Statistical Consulting Service, which was a fabulous experience. <br />
<br />
I have experience working with R, Matlab and SPSS. I am especially fond of R.<br />
<br />
== '''Something Useful''' ==<br />
The results of plot for simulate.lme always felt lackluster to me. So I edited the existing function to create a version that provides approximate simulated p-value as text output, a graph which scales automatically if limits are not provided as arguments, and a few other things.<br />
I prepared a script with my new function, and demo code at the bottom [[/SomethingUseful|here]].<br />
<br />
== '''Discussion Questions''' ==<br />
=== Chapter 2 ===<br />
* I once attended an HLM workshop at a large Education conference. A fellow attendee was skeptical of the entire enterprise. In the example provided the within-group, between-group and pooled effects for socio-economic status were all positively and statistically significantly related with the outcome measure. His opinion was, that since all three effects were in the same direction and significant, why bother with the extra complexity, since it would simply confuse the pants off his trustees anyway. How would you respond?<br />
* '''Post:''' Consider the macro-micro-micro-macro causal chain (pg. 12, Figure 2.7). What would happen if one were to model this simply as a macro-macro (W -> Z) model, omitting the intermediate variables? What if the true relationship was micro-macro-macro-micro? The potential for errors in causal assumptions have always bothered me, especially in SEM type analyses, but it still applies here. In Psychology we are attributing some meaning to a variable, but it could easily be read another way. Here's a convoluted example: The researcher assumes: Teacher Disciplinarianism -> Student Behaviour -> Student Success -> Teacher Stress. But perhaps the pathways are different, and in fact: Student Behaviour -> Teacher Disciplinarianism -> Teacher Stress -> Student Success.<br />
<br />
=== Chapter 3 ===<br />
* In the course we use the language of "contextual", "compositional", "between", "within" and "pooled" effects. Identify each from the example on page 28-29 and on the graph Figure 3.4.<br />
<br />
=== Chapter 4 ===<br />
* Describe the relative advantages and disadvantages of REML and ML estimation. When should you choose one over the other?<br />
* Write an R script to simulate appropriate data and fit Models 3 and 4 from page 70.<br />
<br />
<br />
=== SPIDA Models ===<br />
*[[/Model1|Model 1 - fit]]<br />
*[[/Model2|Model 2 with Contextual Variable - fitc]]<br />
*[[/Model3|Model 3 Centered Within Group and Contextual Variable - fitcd]]<br />
*[[/Model4|Model 4 Centered Within Group RE - fitca]]<br />
*[[/Model5|Model 5 Minority and ses - fit]]<br />
*[[/A2]]</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/SomethingUsefulMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/SomethingUseful2012-08-20T04:42:50Z<p>Smithce: Created page with " plotSimulateLME <- function(x, form = y ~ x | df * method, df = attr(x, "df"), weights, xlab = "Empirical p-value", ylab = "Nominal p-value", pobs = NULL, alpha = 0.05, da..."</p>
<hr />
<div> plotSimulateLME <-<br />
function(x, form = y ~ x | df * method, df = attr(x, "df"), weights,<br />
xlab = "Empirical p-value",<br />
ylab = "Nominal p-value", <br />
pobs = NULL,<br />
alpha = 0.05, dalpha = 0.005, digits = 4,<br />
xlim = NULL,<br />
ylim = NULL, ...)<br />
{<br />
ML <- !is.null(x$null$ML)<br />
if(ML) {<br />
if (is.null(x$alt$ML))<br />
stop("Plot method only implemented for comparing models")<br />
okML <- x$null$ML[, "info"] < 8 & x$alt$ML[, "info"] < 8<br />
}<br />
REML <- !is.null(x$null$REML)<br />
if(REML) {<br />
if (is.null(x$alt$REML))<br />
stop("Plot method only implemented for comparing models")<br />
okREML <- x$null$REML[, "info"] < 8 & x$alt$REML[, "info"] < 8<br />
}<br />
<br />
if (is.null(df)) {<br />
stop("No degrees of freedom specified")<br />
}<br />
if ((ldf <- length(df)) > 1) {<br />
df <- sort(unique(df))<br />
if (missing(weights)) {<br />
weights <- rep.int(1/ldf, ldf)<br />
} else {<br />
if (!identical(weights,FALSE) && length(weights) != ldf)<br />
stop("Degrees of freedom and weights must have the same length")<br />
}<br />
} else {<br />
weights <- FALSE<br />
}<br />
useWgts <- (length(weights) != 1)<br />
<br />
if (any(df < 0)) {<br />
stop("Negative degrees of freedom not allowed")<br />
} else {<br />
if ((ldf == 1) && (df == 0)) {<br />
stop("More than one degree of freedom is needed when one them is zero.")<br />
}<br />
}<br />
if (ML) {<br />
MLstat <-<br />
rev(sort(2 * pmax(0, x$alt$ML[okML, "logLik"] - x$null$ML[okML,"logLik"])))<br />
MLy <- lapply(df,<br />
function(df, x) {<br />
if (df > 0) 1 - pchisq(x, df) else 1*(x == 0)<br />
}, x = MLstat)<br />
dfC <- paste("df",df,sep="=")<br />
if (useWgts) { # has weights<br />
if (ldf == 2) { # will interpolate<br />
MLy <-<br />
c(MLy[[1]], weights[1] * MLy[[1]] + weights[2] * MLy[[2]], MLy[[2]])<br />
MLdf <- rep(c(dfC[1], paste("Mix(",df[1],",",df[2],")",sep=""),<br />
dfC[2]), rep(length(MLstat), ldf + 1))<br />
} else {<br />
aux <- weights[1] * MLy[[1]]<br />
auxNam <- paste("Mix(",df[1],sep="")<br />
for(i in 2:ldf) {<br />
aux <- aux + weights[i] * MLy[[i]]<br />
auxNam <- paste(auxNam, ",", df[i],sep="")<br />
}<br />
auxNam <- paste(auxNam, ")",sep="")<br />
MLy <- c(unlist(MLy), aux)<br />
MLdf <- rep(c(dfC, auxNam), rep(length(MLstat), ldf + 1))<br />
}<br />
MLx <- rep((1:length(MLstat) - 0.5)/length(MLstat), ldf + 1)<br />
} else {<br />
MLy <- unlist(MLy)<br />
MLdf <- rep(dfC, rep(length(MLstat), ldf))<br />
MLx <- rep((1:length(MLstat) - 0.5)/length(MLstat), ldf)<br />
}<br />
auxInd <- MLdf != "df=0"<br />
meth <- rep("ML", length(MLy))<br />
Mdf <- MLdf<br />
} else {<br />
MLy <- MLdf <- MLx <- auxInd <- meth <- Mdf <- NULL<br />
}<br />
if (REML) {<br />
REMLstat <- rev(sort(2 * pmax(0, x$alt$REML[okREML, "logLik"] -<br />
x$null$REML[okREML, "logLik"])))<br />
REMLy <- lapply(df,<br />
function(df, x) {<br />
if (df > 0) {<br />
1 - pchisq(x, df)<br />
} else {<br />
val <- rep(0, length(x))<br />
val[x == 0] <- 1<br />
val<br />
}<br />
}, x = REMLstat)<br />
dfC <- paste("df",df,sep="=")<br />
if (useWgts) { # has weights<br />
if (ldf == 2) { # will interpolate<br />
REMLy <-<br />
c(REMLy[[1]], weights[1] * REMLy[[1]] + weights[2] * REMLy[[2]], REMLy[[2]])<br />
REMLdf <- rep(c(dfC[1], paste("Mix(",df[1],",",df[2],")",sep=""),<br />
dfC[2]), rep(length(REMLstat), ldf + 1))<br />
} else {<br />
aux <- weights[1] * REMLy[[1]]<br />
auxNam <- paste("Mix(",df[1],sep="")<br />
for(i in 2:ldf) {<br />
aux <- aux + weights[i] * REMLy[[i]]<br />
auxNam <- paste(auxNam, ",", df[i],sep="")<br />
}<br />
auxNam <- paste(auxNam, ")",sep="")<br />
REMLy <- c(unlist(REMLy), aux)<br />
REMLdf <- rep(c(dfC, auxNam), rep(length(REMLstat), ldf + 1))<br />
}<br />
REMLx <- rep((1:length(REMLstat) - 0.5)/length(REMLstat), ldf + 1)<br />
} else {<br />
REMLy <- unlist(REMLy)<br />
REMLdf <- rep(dfC, rep(length(REMLstat), ldf))<br />
REMLx <- rep((1:length(REMLstat) - 0.5)/length(REMLstat), ldf)<br />
}<br />
auxInd <- c(auxInd, REMLdf != "df=0")<br />
meth <- c(meth, rep("REML", length(REMLy)))<br />
Mdf <- c(Mdf, REMLdf)<br />
} else {<br />
REMLy <- REMLdf <- REMLx <- NULL<br />
}<br />
<br />
meth <- meth[auxInd]<br />
Mdf <- Mdf[auxInd]<br />
Mdf <- ordered(Mdf, levels = unique(Mdf))<br />
frm <- data.frame(x = c(MLx, REMLx)[auxInd], y = c(MLy, REMLy)[auxInd],<br />
df = Mdf, method = meth) <br />
<br />
MLresult <- frm[frm$method=="ML",]<br />
REMLresult <- frm[frm$method=="REML",]<br />
<br />
if( nrow(MLresult)>0 ){<br />
message("Maximum Likelihood \n")<br />
<br />
pML.Emp.to.Nom <- MLresult[MLresult$x>(alpha-dalpha)&MLresult$x<(alpha+dalpha),1:2]<br />
colnames(pML.Emp.to.Nom) <- c("Empirical p-value", "Nominal p-value") <br />
if (nrow(pML.Emp.to.Nom) == 0) {<br />
message ( "p-value not obtained at desired resolution. Increase 'dalpha' or re-run <br />
simulation with more iterations" )<br />
} else print(pML.Emp.to.Nom[,2:1], row.names=F, digits=digits)<br />
if(is.numeric(pobs)){<br />
message(paste("Nominal p-value", pobs, <br />
"approximately equal to empirical p-value", MLresult$x[which.min(abs(MLresult$y - pobs))]))<br />
}<br />
message("")<br />
<br />
pML.Nom.to.Emp <- MLresult[MLresult$y>(alpha-dalpha)&MLresult$y<(alpha+dalpha),1:2]<br />
colnames(pML.Nom.to.Emp) <- c("Empirical p-value", "Nominal p-value") <br />
}<br />
<br />
if( nrow(REMLresult)>0 ){<br />
message("REML \n")<br />
<br />
pREML.Emp.to.Nom <- REMLresult[REMLresult$x>(alpha-dalpha)&REMLresult$x<(alpha+dalpha),1:2]<br />
colnames(pREML.Emp.to.Nom) <- c("Empirical p-value", "Nominal p-value") <br />
if (nrow(pREML.Emp.to.Nom) == 0) {<br />
message ( "p-value not obtained at desired resolution. Increase 'dalpha' or re-run <br />
simulation with more iterations" )<br />
} else print(pREML.Emp.to.Nom[,2:1], row.names=F, digits=digits)<br />
<br />
if(is.numeric(pobs)){<br />
message(paste("Nominal p-value", pobs, <br />
"approximately equal to empirical p-value", REMLresult$x[which.min(abs(REMLresult$y - pobs))]))<br />
}<br />
}<br />
<br />
if(nrow(MLresult)>0 & nrow(REMLresult)>0) {<br />
if(is.null(xlim)) {<br />
xlim <- c(0,<br />
max(c(MLresult$x[which.min(abs(MLresult$x - alpha))],<br />
MLresult$x[which.min(abs(MLresult$y - alpha))],<br />
MLresult$x[which.min(abs(MLresult$y - pobs))],<br />
REMLresult$x[which.min(abs(REMLresult$x - alpha))],<br />
REMLresult$x[which.min(abs(REMLresult$y - alpha))],<br />
REMLresult$x[which.min(abs(REMLresult$y - pobs))])) + dalpha) }<br />
if(is.null(ylim)) {<br />
ylim <- c(0,<br />
max(c(MLresult$y[which.min(abs(MLresult$x - alpha))],<br />
MLresult$y[which.min(abs(MLresult$y - alpha))],<br />
MLresult$y[which.min(abs(MLresult$y - pobs))],<br />
REMLresult$y[which.min(abs(REMLresult$x - alpha))],<br />
REMLresult$y[which.min(abs(REMLresult$y - alpha))],<br />
REMLresult$y[which.min(abs(REMLresult$y - pobs))])) + dalpha)}<br />
op <- par(mfrow = c(1,2))<br />
plot(MLresult$x, MLresult$y, type='l',<br />
main = "Maximum Likelihood",<br />
ylab = "Nominal p-value",<br />
xlab = "Empirical p-value",<br />
xlim = xlim,<br />
ylim = ylim)<br />
# Empirical to Nominal cut-off alpha<br />
lines(x=c(xlim[1],MLresult$x[which.min(abs(MLresult$x - alpha))],MLresult$x[which.min(abs(MLresult$x - alpha))]),y=c(MLresult$y[which.min(abs(MLresult$x - alpha))],MLresult$y[which.min(abs(MLresult$x - alpha))],0),lty="dashed",col="red")<br />
# Nominal to Empirical cut-off alpha (indicative of degree to which conservative)<br />
lines(x=c(xlim[1],MLresult$x[which.min(abs(MLresult$y - alpha))],MLresult$x[which.min(abs(MLresult$y - alpha))]),y=c(MLresult$y[which.min(abs(MLresult$y - alpha))],MLresult$y[which.min (abs(MLresult$y - alpha))],0),lty="dashed",col="pink")<br />
# Observed nominal to empirical (closest approximation)<br />
if(!is.null(pobs)) { <br />
lines(x=c(xlim[1],MLresult$x[which.min(abs(MLresult$y - pobs))],MLresult$x[which.min(abs(MLresult$y - pobs))]),y=c(MLresult$y[which.min(abs(MLresult$y - pobs))],MLresult$y[which.min(abs (MLresult$y - pobs))],0),lty="dashed",col="blue")<br />
}<br />
<br />
plot(REMLresult$x, REMLresult$y, type='l',<br />
main = "REML",<br />
ylab = "Nominal p-value",<br />
xlab = "Empirical p-value",<br />
xlim = xlim,<br />
ylim = ylim)<br />
# Empirical to Nominal cut-off alpha<br />
lines(x=c(xlim[1],REMLresult$x[which.min(abs(REMLresult$x - alpha))],REMLresult$x[which.min(abs(REMLresult$x - alpha))]),y=c(REMLresult$y[which.min(abs(REMLresult$x - alpha))],REMLresult$y[which.min(abs(REMLresult$x - alpha))],0),lty="dashed",col="red")<br />
# Nominal to Empirical cut-off alpha (indicative of degree to which conservative)<br />
lines(x=c(xlim[1],REMLresult$x[which.min(abs(REMLresult$y - alpha))],REMLresult$x[which.min(abs(REMLresult$y - alpha))]),y=c(REMLresult$y[which.min(abs(REMLresult$y - alpha))],REMLresult$y[which.min(abs(REMLresult$y - alpha))],0),lty="dashed",col="pink")<br />
# Observed nominal to empirical (closest approximation)<br />
if(!is.null(pobs)) { <br />
lines(x=c(xlim[1],REMLresult$x[which.min(abs(REMLresult$y - pobs))],REMLresult$x[which.min(abs(REMLresult$y - pobs))]),<br />
y=c(REMLresult$y[which.min(abs(REMLresult$y - pobs))],REMLresult$y[which.min(abs(REMLresult$y - pobs))],0),<br />
lty="dashed",col="blue")}<br />
par(op)<br />
}else if(nrow(MLresult)>0) {<br />
if(is.null(xlim)) {<br />
xlim <- c(0,<br />
max(c(MLresult$x[which.min(abs(MLresult$x - alpha))],<br />
MLresult$x[which.min(abs(MLresult$y - alpha))],<br />
MLresult$x[which.min(abs(MLresult$y - pobs))])) + dalpha) }<br />
if(is.null(ylim)) {<br />
ylim <- c(0,<br />
max(c(MLresult$y[which.min(abs(MLresult$x - alpha))],<br />
MLresult$y[which.min(abs(MLresult$y - alpha))],<br />
MLresult$y[which.min(abs(MLresult$y - pobs))])) + dalpha) }<br />
plot(MLresult$x, MLresult$y, type='l',<br />
main = "Maximum Likelihood",<br />
ylab = "Nominal p-value",<br />
xlab = "Empirical p-value",<br />
xlim = xlim,<br />
ylim = ylim)<br />
# Empirical to Nominal cut-off alpha<br />
lines(x=c(xlim[1],MLresult$x[which.min(abs(MLresult$x - alpha))],MLresult$x[which.min(abs(MLresult$x - alpha))]),y=c(MLresult$y[which.min(abs(MLresult$x - alpha))],MLresult$y[which.min (abs(MLresult$x - alpha))],0),lty="dashed",col="red")<br />
# Nominal to Empirical cut-off alpha (indicative of degree to which conservative)<br />
lines(x=c(xlim[1],MLresult$x[which.min(abs(MLresult$y - alpha))],MLresult$x[which.min(abs(MLresult$y - alpha))]),y=c(MLresult$y[which.min(abs(MLresult$y - alpha))],MLresult$y[which.min (abs(MLresult$y - alpha))],0),lty="dashed",col="pink")<br />
# Observed nominal to empirical (closest approximation)<br />
if(!is.null(pobs)) {<br />
lines(x=c(xlim[1],MLresult$x[which.min(abs(MLresult$y - pobs))],MLresult$x[which.min(abs(MLresult$y - pobs))]),y=c(MLresult$y[which.min(abs(MLresult$y - pobs))],MLresult$y[which.min(abs (MLresult$y - pobs))],0),lty="dashed",col="blue")}<br />
}else if(nrow(REMLresult)>0){<br />
if(is.null(xlim)) {<br />
xlim <- c(0,<br />
max(c(REMLresult$x[which.min(abs(REMLresult$x - alpha))],<br />
REMLresult$x[which.min(abs(REMLresult$y - alpha))],<br />
REMLresult$x[which.min(abs(REMLresult$y - pobs))])) + dalpha) }<br />
if(is.null(ylim)) {<br />
ylim <- c(0,<br />
max(c(REMLresult$y[which.min(abs(REMLresult$x - alpha))],<br />
REMLresult$y[which.min(abs(REMLresult$y - alpha))],<br />
REMLresult$y[which.min(abs(REMLresult$y - pobs))])) + dalpha)}<br />
plot(REMLresult$x, REMLresult$y, type='l',<br />
main = "REML",<br />
ylab = "Nominal p-value",<br />
xlab = "Empirical p-value",<br />
xlim = xlim,<br />
ylim = ylim)<br />
# Empirical to Nominal cut-off alpha<br />
lines(x=c(xlim[1],REMLresult$x[which.min(abs(REMLresult$x - alpha))],REMLresult$x[which.min(abs(REMLresult$x - alpha))]),y=c(REMLresult$y[which.min(abs(REMLresult$x - alpha))],REMLresult$y[which.min(abs(REMLresult$x - alpha))],0),lty="dashed",col="red")<br />
# Nominal to Empirical cut-off alpha (indicative of degree to which conservative)<br />
lines(x=c(xlim[1],REMLresult$x[which.min(abs(REMLresult$y - alpha))],REMLresult$x[which.min(abs(REMLresult$y - alpha))]),y=c(REMLresult$y[which.min(abs(REMLresult$y - alpha))],REMLresult$y[which.min(abs(REMLresult$y - alpha))],0),lty="dashed",col="pink")<br />
# Observed nominal to empirical (closest approximation)<br />
if(!is.null(pobs)) {<br />
lines(x=c(xlim[1],REMLresult$x[which.min(abs(REMLresult$y - pobs))],REMLresult$x[which.min(abs(REMLresult$y - pobs))]),y=c(REMLresult$y[which.min(abs(REMLresult$y - pobs))],REMLresult$y [which.min(abs(REMLresult$y - pobs))],0),lty="dashed",col="blue")}<br />
}<br />
}<br />
<br />
## Example ##<br />
library(spidadev)<br />
library(nlme)<br />
dd <- hs1<br />
str(dd)<br />
<br />
fit1 <- lme( mathach ~ ses, data=dd,<br />
random = ~ 1|school, <br />
na.action = na.exclude )<br />
fit2 <- lme( mathach ~ ses, data=dd,<br />
random = ~ 1 + ses|school, # Model 2 has more complex random part<br />
na.action = na.exclude )<br />
<br />
anova( fit1, fit2 ) # Test significance of improvement of random slope<br />
# However this test is conservative<br />
# Simulate to adjust p-values<br />
<br />
system.time( sim.out <- simulate( fit1, m2 = fit2, nsim = 1000) )<br />
plotSimulateLME( sim.out , dalpha = .001 )<br />
plotSimulateLME( sim.out , dalpha = .005, pobs = 0.1021 )<br />
<br />
system.time( sim.out <- simulate( fit1, m2 = fit2, nsim = 1000, method = "ML") )<br />
plotSimulateLME( sim.out , dalpha = .0025, pobs = 0.1021 )<br />
<br />
system.time( sim.out <- simulate( fit1, m2 = fit2, nsim = 1000, method = "REML") )<br />
plotSimulateLME( sim.out , dalpha = .0025, pobs = 0.1021 )</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithceMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce2012-08-20T04:32:56Z<p>Smithce: </p>
<hr />
<div>==About Me==<br />
I am a PhD student in Psychology in the Quantitative Methods Area. I completed my Masters in Psychology degree at York studying depth perception and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background! This year I had the privilege of working as a consultant with the Statistical Consulting Service, which was a fabulous experience. <br />
<br />
I have experience working with R, Matlab and SPSS. I am especially fond of R.<br />
<br />
== '''Something Useful''' ==<br />
The results of plot for simulate.lme always felt lackluster to me. So I created a new and improved version.<br />
I prepared a script with my new function, and demo code at the bottom [[/SomethingUseful|here]].<br />
<br />
== '''Discussion Questions''' ==<br />
=== Chapter 2 ===<br />
* I once attended an HLM workshop at a large Education conference. A fellow attendee was skeptical of the entire enterprise. In the example provided the within-group, between-group and pooled effects for socio-economic status were all positively and statistically significantly related with the outcome measure. His opinion was, that since all three effects were in the same direction and significant, why bother with the extra complexity, since it would simply confuse the pants off his trustees anyway. How would you respond?<br />
* '''Post:''' Consider the macro-micro-micro-macro causal chain (pg. 12, Figure 2.7). What would happen if one were to model this simply as a macro-macro (W -> Z) model, omitting the intermediate variables? What if the true relationship was micro-macro-macro-micro? The potential for errors in causal assumptions have always bothered me, especially in SEM type analyses, but it still applies here. In Psychology we are attributing some meaning to a variable, but it could easily be read another way. Here's a convoluted example: The researcher assumes: Teacher Disciplinarianism -> Student Behaviour -> Student Success -> Teacher Stress. But perhaps the pathways are different, and in fact: Student Behaviour -> Teacher Disciplinarianism -> Teacher Stress -> Student Success.<br />
<br />
=== Chapter 3 ===<br />
* In the course we use the language of "contextual", "compositional", "between", "within" and "pooled" effects. Identify each from the example on page 28-29 and on the graph Figure 3.4.<br />
<br />
=== Chapter 4 ===<br />
* Describe the relative advantages and disadvantages of REML and ML estimation. When should you choose one over the other?<br />
* Write an R script to simulate appropriate data and fit Models 3 and 4 from page 70.<br />
<br />
<br />
=== SPIDA Models ===<br />
*[[/Model1|Model 1 - fit]]<br />
*[[/Model2|Model 2 with Contextual Variable - fitc]]<br />
*[[/Model3|Model 3 Centered Within Group and Contextual Variable - fitcd]]<br />
*[[/Model4|Model 4 Centered Within Group RE - fitca]]<br />
*[[/Model5|Model 5 Minority and ses - fit]]<br />
*[[/A2]]</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Submitted_sample_exam_questionsMATH 6643 Summer 2012 Applications of Mixed Models/Submitted sample exam questions2012-07-24T16:22:22Z<p>Smithce: </p>
<hr />
<div>'''Question 1:'''<br />
<br />
Consider the following output:<br />
::<br />
<pre><br />
> head(hs)<br />
school mathach ses Sex Minority Size Sector PRACAD DISCLIM<br />
1 1317 12.862 0.882 Female No 455 Catholic 0.95 -1.694<br />
2 1317 8.961 0.932 Female Yes 455 Catholic 0.95 -1.694<br />
3 1317 4.756 -0.158 Female Yes 455 Catholic 0.95 -1.694<br />
4 1317 21.405 0.362 Female Yes 455 Catholic 0.95 -1.694<br />
5 1317 20.748 1.372 Female No 455 Catholic 0.95 -1.694<br />
6 1317 18.362 0.132 Female Yes 455 Catholic 0.95 -1.694<br />
> fit <- lme( mathach ~ ses * cvar(ses,school), hs, <br />
+ random = ~ 1 + ses|school)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: hs <br />
AIC BIC logLik<br />
12846.85 12891.54 -6415.423<br />
<br />
Random effects:<br />
Formula: ~1 + ses | school<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 1.6293867 (Intr)<br />
ses 0.6614903 -0.469<br />
Residual 6.1109156 <br />
<br />
Fixed effects: mathach ~ ses * cvar(ses, school) <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 12.681917 0.3054760 1935 41.51526 0.0000<br />
ses 2.243374 0.2416545 1935 9.28339 0.0000<br />
cvar(ses, school) 3.687892 0.7699000 38 4.79009 0.0000<br />
ses:cvar(ses, school) 0.873953 0.5771829 1935 1.51417 0.1301<br />
Correlation: <br />
(Intr) ses cv(,s)<br />
ses -0.188 <br />
cvar(ses, school) 0.022 -0.261 <br />
ses:cvar(ses, school) -0.258 0.065 0.014<br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-3.2291287 -0.7433282 0.0306118 0.7770370 2.6906899 <br />
<br />
Number of Observations: 1977<br />
Number of Groups: 40 <br />
<br />
</pre><br />
<br />
:* Identify the expression of the model in mathematical formula and the usual model assumption.<br />
:* Find out the variances of the group effects, residuals and the response.<br />
:* Sketch the estimated response function for a school with mean ses of 0.<br />
:* For what value of ses is the variance of mathach estimated to be minimized.<br />
<br />
'''Question 2:'''<br />
<br />
Longitudinal data analysis with mixed models: Consider a mixed model with random intercept and slope with respect to time, T. Suppose that the G matrix is <br />
<br />
::<math><br />
\begin{bmatrix}<br />
\tau_{00} & \tau_{01} \\<br />
\tau_{10} & \tau_{11} <br />
\end{bmatrix}<br />
</math><br />
<br />
:* Find the value of T for which the variance of Y is minimized and the minimum variance. <br />
:* Show that recentering T on this value (if known) turns the G matrix into one with only two free parameters.<br />
:* Sketch a data plot to show the location and value of the minimum standard deviation of lines.<br />
<br />
'''Question 3:'''<br />
<br />
Explain why would would want to add <math>\bar{X}_j</math> or <math>SD_j</math> to a multilevel model.<br />
<br />
'''Question 4:'''<br />
<br />
Discuss the interpretation/ramifications of extreme ICC values in the design of a multilevel study.<br />
<br />
'''Question 5:'''<br />
<br />
Explain the similarities and differences between within effects, between effects, contextual effects, compositional effects; and clusters.<br />
<br />
'''Question 5:'''<br />
<br />
Discuss/explain the relationship between longitudinal data and hierarchical data with appropriate examples and theory.<br />
<br />
'''Question 6:'''<br />
<br />
Let Σ be symmetric. Show that Σ is positive-definite if and only there exists a non-singular matrix A such that Σ = AA'<br />
<br />
'''Question 7:'''<br />
<br />
Why do we study longitudinal and mixed models?<br />
<br />
=== Ryan's Questions ===<br />
<br />
'''Early Chapters/Course Content:'''<br />
<br />
What is Simpson's Paradox and how is it related to HLM? - Please describe the necessary relationships and sketch them accordingly.<br />
<br />
Under what two conditions might someone be completely unconcerned with the possible problems associated with this paradox?<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
In Chapter 11 Snijders & Bosker discuss the problem of optimal sample size in order to obtain accurate estimates of the ICC. Explain the reasoning behind the process of optimization for the hierarchical sample (assuming equal cluster sizes with a sample of size ''M'' with ''N'' clusters of size ''n'').<br />
<br />
=== Carrie's Questions ===<br />
'''Early Chapters/Course Content:'''<br />
<br />
We learned about a simulation conducted in which the estimate of the slope from a mixed model (without contextual effect) was obtained varying the within-cluster variance (see figure below). What are the implications of this simulation study?<br />
<br />
[[File:MixedModelSimResult.png|350px]]<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
Data is obtained from on the effects of two treatments (A and B). Data was recorded on symptoms weekly throughout 10 weeks of active treatment and then for 8 more weeks following termination of the treatment. A linear spline model is fit to the data and the following results were obtained:<br />
::<br />
<pre><br />
<br />
sp <- function( x ) {<br />
gsp( x,<br />
knots = c( 10 ), # 1 knots => 2 intervals<br />
degree = c( 1 , 1 ) , # linear in each interval<br />
smooth = c( 0 ) # continuous at the knot<br />
)<br />
}<br />
<br />
> fit <- lme( Symptom ~ sp(Weeks)*tx, random=~1+Weeks|id, data = d)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: d <br />
AIC BIC logLik<br />
15099.44 15153.18 -7539.719<br />
<br />
Random effects:<br />
Formula: ~1 + Weeks | id<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 37.921194 (Intr)<br />
Weeks 2.395329 -0.218<br />
Residual 23.285420 <br />
<br />
Fixed effects: Symptom ~ sp(Weeks) * tx <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 131.84484 6.450470 1516 20.43957 0.0000<br />
sp(Weeks)D1(0) -11.21889 0.508022 1516 -22.08346 0.0000<br />
sp(Weeks)C(10).1 18.24584 0.571104 1516 31.94835 0.0000<br />
txB -2.61621 9.122343 78 -0.28679 0.7750<br />
sp(Weeks)D1(0):txB 1.55017 0.718452 1516 2.15765 0.0311<br />
sp(Weeks)C(10).1:txB -4.83791 0.807664 1516 -5.99001 0.0000<br />
Correlation: <br />
(Intr) sp(W)D1(0) sp(W)C(10).1 txB s(W)D1(0):<br />
sp(Weeks)D1(0) -0.371 <br />
sp(Weeks)C(10).1 0.256 -0.604 <br />
txB -0.707 0.262 -0.181 <br />
sp(Weeks)D1(0):txB 0.262 -0.707 0.427 -0.371 <br />
sp(Weeks)C(10).1:txB -0.181 0.427 -0.707 0.256 -0.604 <br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-4.109811205 -0.593593851 -0.002546668 0.654503092 3.122385555 <br />
<br />
Number of Observations: 1600<br />
Number of Groups: 80 <br />
</pre><br />
<br />
:* Interpret the coefficients in the model.<br />
:* Sketch the predicted trajectories.<br />
:* Bonus: How would you test whether there is a significant difference in the predicted symptom score at week 10 or week 18?<br />
<br />
== Georges' Questions ==<br />
<br />
=== Interpreting p-values ===<br />
Consider a regression of a continuous variable Y on a continuous variable X and a dichotomous factor coded with an indicator variable G. Consider a regression of Y on X and G with:<br />
<br />
<math> \hat{Y} = \hat{\beta}_0 +\hat{\beta}_X X +\hat{\beta}_G G </math><br />
<br />
In the multiple regression of Y on X and G neither <math>\hat{\beta}_X</math> nor <math>\hat{\beta}_G</math> are significant. However the F-test for the hypothesis that both parameters are 0 yields a p-value of 0.002. <br />
<br />
Sketch a plausible data set that could exhibit this phenomenon in data space with axes for Y and X and group membership indicated by different characters. Also sketch relevant confidence ellipses in <math>\beta_X , \beta_G</math> space.<br />
<br />
=== Multilevel models ===<br />
Consider a multilevel random slopes model of the form:<br />
:<math>Y_i = X_i\gamma +Z_i u_i+ \epsilon_i\!</math><br />
with <math>u_i \sim N(0,G)\!</math>, <math>\epsilon_i \sim N(0,\sigma^2 I)\!</math> independent of <math>{{u}_{i}}\!</math>, <math>{{Z}_{i}}={{X}_{i}}\!</math>; independent for <math>i=1,\ldots ,M\!</math>.<br />
<br />
Let <br />
<math>{{\beta }_{i}}=\gamma +{{u}_{i}}\!</math>. <br />
<br />
In the following questions, treat the macro level parameters <br />
<math>\gamma ,\ {{\sigma }^{2}}\!</math> and <math>G\!</math> as they were fixed and known.<br />
# Give an expression for the BLUE of <math>\beta_i\!</math> .<br />
# Give an expression for the BLUP of <math>\beta_i\!</math> as a function of the BLUE and other non-random quantities.<br />
# Describe how the BLUP is a shrinkage estimator based on the BLUE.<br />
# For large <math>n_i\!</math> and relatively fixed variances for the values of X within each cluster, how will the BLUP behave relative to the BLUE?<br />
# What are the implications for the BLUP if <math>G\!</math> is highly concentrated?<br />
=== Interpreting a factorization of G ===<br />
Consider a random coefficient model with a random intercept and random slope. Let the G matrix have the decomposition <math>G = AA'\!</math> where <br />
: <math> A = <br />
\left[ \begin{matrix}<br />
{{a}_{0}} & {{a}_{01}} \\<br />
0 & {{a}_{1}} \\<br />
\end{matrix} \right]</math><br />
<br />
Show that <math> a_0 </math> is the minimal standard deviation of random regression lines above and below the population regression line.<br />
=== Intrepreting output ===<br />
The following questions refer to the output below for a mixed model for the full high school math achievement data set. The model uses SES, SES.School which is the mean SES in the sample in each school, ‘female’ which is an individual level indicator variable and ‘Type’ which is three-level factor with levels “Coed”, “Girl” and “Boy” with the obvious definition and Sector which is a 2-level factor with levels “Catholic” and “Public”.<br />
# Consider two Catholic ‘girl’ schools, one with mean SES = 0 and the other with mean SES = 1. Suppose the values of SES in the former school range from -1 to 1 and in the latter school from 0 to 2. Draw a graph showing the predicted MathAch in these two schools over the range of values of SES in each school. On your graph identify the value and location of the contextual effect of SES, the within school effect of SES and the compositional effect of SES.<br />
# Suppose you were to refit the model without SES.School. What would you expect to happen to the coefficient for SES? Would it stay roughly the same, get bigger or get smaller, or is the change unpredictable? Explain.<br />
# Suppose you want to perform an overall test of the importance of gender in the model, either within schools or between schools. Specify a hypothesis matrix that would perform this test.<br />
# Estimate the difference between the predicted math achievement of a boy in a boys’ school versus a girl in a girls’ school. If you suspected that this is affected by the SES of the school, how would you modify the model to test this hypothesis? <br />
<br />
[[Media:MATH_6643_Sample_exam_output_2.pdf]]<br />
<br />
=== Testing identifiability of the G and R models ===<br />
Consider a mixed model for data with a response variable Y and a single predictor X. Suppose that each cluster has size 2 and X is measured at the values 0 and 1. Is it possible to identify the parameters of a random slope model? What if each cluster had size 3 and X were observed at levels -1, 0 and +1?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Submitted_sample_exam_questionsMATH 6643 Summer 2012 Applications of Mixed Models/Submitted sample exam questions2012-07-24T14:44:27Z<p>Smithce: </p>
<hr />
<div>'''Question 1:'''<br />
<br />
Consider the following output:<br />
::<br />
<pre><br />
> head(hs)<br />
school mathach ses Sex Minority Size Sector PRACAD DISCLIM<br />
1 1317 12.862 0.882 Female No 455 Catholic 0.95 -1.694<br />
2 1317 8.961 0.932 Female Yes 455 Catholic 0.95 -1.694<br />
3 1317 4.756 -0.158 Female Yes 455 Catholic 0.95 -1.694<br />
4 1317 21.405 0.362 Female Yes 455 Catholic 0.95 -1.694<br />
5 1317 20.748 1.372 Female No 455 Catholic 0.95 -1.694<br />
6 1317 18.362 0.132 Female Yes 455 Catholic 0.95 -1.694<br />
> fit <- lme( mathach ~ ses * cvar(ses,school), hs, <br />
+ random = ~ 1 + ses|school)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: hs <br />
AIC BIC logLik<br />
12846.85 12891.54 -6415.423<br />
<br />
Random effects:<br />
Formula: ~1 + ses | school<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 1.6293867 (Intr)<br />
ses 0.6614903 -0.469<br />
Residual 6.1109156 <br />
<br />
Fixed effects: mathach ~ ses * cvar(ses, school) <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 12.681917 0.3054760 1935 41.51526 0.0000<br />
ses 2.243374 0.2416545 1935 9.28339 0.0000<br />
cvar(ses, school) 3.687892 0.7699000 38 4.79009 0.0000<br />
ses:cvar(ses, school) 0.873953 0.5771829 1935 1.51417 0.1301<br />
Correlation: <br />
(Intr) ses cv(,s)<br />
ses -0.188 <br />
cvar(ses, school) 0.022 -0.261 <br />
ses:cvar(ses, school) -0.258 0.065 0.014<br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-3.2291287 -0.7433282 0.0306118 0.7770370 2.6906899 <br />
<br />
Number of Observations: 1977<br />
Number of Groups: 40 <br />
<br />
</pre><br />
<br />
:* Identify the expression of the model in mathematical formula and the usual model assumption.<br />
:* Find out the variances of the group effects, residuals and the response.<br />
:* Sketch the estimated response function for a school with mean ses of 0.<br />
:* For what value of ses is the variance of mathach estimated to be minimized.<br />
<br />
'''Question 2:'''<br />
<br />
Longitudinal data analysis with mixed models: Consider a mixed model with random intercept and slope with respect to time, T. Suppose that the G matrix is <br />
<br />
::<math><br />
\begin{bmatrix}<br />
\tau_{00} & \tau_{01} \\<br />
\tau_{10} & \tau_{11} <br />
\end{bmatrix}<br />
</math><br />
<br />
:* Find the value of T for which the variance of Y is minimized and the minimum variance. <br />
:* Show that recentering T on this value (if known) turns the G matrix into one with only two free parameters.<br />
:* Sketch a data plot to show the location and value of the minimum standard deviation of lines.<br />
<br />
'''Question 3:'''<br />
<br />
Explain why would would want to add <math>\bar{X}_j</math> or <math>SD_j</math> to a multilevel model.<br />
<br />
'''Question 4:'''<br />
<br />
Discuss the interpretation/ramifications of extreme ICC values in the design of a multilevel study.<br />
<br />
'''Question 5:'''<br />
<br />
Explain the similarities and differences between within effects, between effects, contextual effects, compositional effects; and clusters.<br />
<br />
'''Question 5:'''<br />
<br />
Discuss/explain the relationship between longitudinal data and hierarchical data with appropriate examples and theory.<br />
<br />
'''Question 6:'''<br />
<br />
Let Σ be symmetric. Show that Σ is positive-definite if and only there exists a non-singular matrix A such that Σ = AA'<br />
<br />
'''Question 7:'''<br />
<br />
Why do we study longitudinal and mixed models?<br />
<br />
=== Ryan's Questions ===<br />
<br />
'''Early Chapters/Course Content:'''<br />
<br />
What is Simpson's Paradox and how is it related to HLM? - Please describe the necessary relationships and sketch them accordingly.<br />
<br />
Under what two conditions might someone be completely unconcerned with the possible problems associated with this paradox?<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
In Chapter 11 Snijders & Bosker discuss the problem of optimal sample size in order to obtain accurate estimates of the ICC. Explain the reasoning behind the process of optimization for the hierarchical sample (assuming equal cluster sizes with a sample of size ''M'' with ''N'' clusters of size ''n'').<br />
<br />
=== Carrie's Questions ===<br />
'''Early Chapters/Course Content:'''<br />
<br />
We learned about a simulation conducted in which the estimate of the slope from a mixed model (without contextual effect) was obtained varying the within-cluster variance (see figure below). What are the implications of this simulation study?<br />
<br />
[[File:MixedModelSimResult.png|350px]]<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
Data is obtained from on the effects of two treatments (A and B). Data was recorded on symptoms weekly throughout 10 weeks of active treatment and then for 8 more weeks following termination of the treatment. A linear spline model is fit to the data and the following results were obtained:<br />
::<br />
<pre><br />
> fit <- lme( Symptom ~ sp(Weeks)*tx, random=~1+Weeks|id, data = d)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: d <br />
AIC BIC logLik<br />
15099.44 15153.18 -7539.719<br />
<br />
Random effects:<br />
Formula: ~1 + Weeks | id<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 37.921194 (Intr)<br />
Weeks 2.395329 -0.218<br />
Residual 23.285420 <br />
<br />
Fixed effects: Symptom ~ sp(Weeks) * tx <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 131.84484 6.450470 1516 20.43957 0.0000<br />
sp(Weeks)D1(0) -11.21889 0.508022 1516 -22.08346 0.0000<br />
sp(Weeks)C(10).1 18.24584 0.571104 1516 31.94835 0.0000<br />
txB -2.61621 9.122343 78 -0.28679 0.7750<br />
sp(Weeks)D1(0):txB 1.55017 0.718452 1516 2.15765 0.0311<br />
sp(Weeks)C(10).1:txB -4.83791 0.807664 1516 -5.99001 0.0000<br />
Correlation: <br />
(Intr) sp(W)D1(0) sp(W)C(10).1 txB s(W)D1(0):<br />
sp(Weeks)D1(0) -0.371 <br />
sp(Weeks)C(10).1 0.256 -0.604 <br />
txB -0.707 0.262 -0.181 <br />
sp(Weeks)D1(0):txB 0.262 -0.707 0.427 -0.371 <br />
sp(Weeks)C(10).1:txB -0.181 0.427 -0.707 0.256 -0.604 <br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-4.109811205 -0.593593851 -0.002546668 0.654503092 3.122385555 <br />
<br />
Number of Observations: 1600<br />
Number of Groups: 80 <br />
</pre><br />
<br />
:* Interpret the coefficients in the model.<br />
:* Sketch the predicted trajectories.<br />
:* Bonus: How would you test whether there is a significant difference in the predicted symptom score at week 10 or week 18?<br />
<br />
== Georges' Questions ==<br />
<br />
=== Interpreting p-values ===<br />
Consider a regression of a continuous variable Y on a continuous variable X and a dichotomous factor coded with an indicator variable G. Consider a regression of Y on X and G with:<br />
<br />
<math> \hat{Y} = \hat{\beta}_0 +\hat{\beta}_X X +\hat{\beta}_G G </math><br />
<br />
In the multiple regression of Y on X and G neither <math>\hat{\beta}_X</math> nor <math>\hat{\beta}_G</math> are significant. However the F-test for the hypothesis that both parameters are 0 yields a p-value of 0.002. <br />
<br />
Sketch a plausible data set that could exhibit this phenomenon in data space with axes for Y and X and group membership indicated by different characters. Also sketch relevant confidence ellipses in <math>\beta_X , \beta_G</math> space.<br />
<br />
=== Multilevel models ===<br />
Consider a multilevel random slopes model of the form:<br />
:<math>Y_i = X_i\gamma +Z_i u_i+ \epsilon_i\!</math><br />
with <math>u_i \sim N(0,G)\!</math>, <math>\epsilon_i \sim N(0,\sigma^2 I)\!</math> independent of <math>{{u}_{i}}\!</math>, <math>{{Z}_{i}}={{X}_{i}}\!</math>; independent for <math>i=1,\ldots ,M\!</math>.<br />
<br />
Let <br />
<math>{{\beta }_{i}}=\gamma +{{u}_{i}}\!</math>. <br />
<br />
In the following questions, treat the macro level parameters <br />
<math>\gamma ,\ {{\sigma }^{2}}\!</math> and <math>G\!</math> as they were fixed and known.<br />
# Give an expression for the BLUE of <math>\beta_i\!</math> .<br />
# Give an expression for the BLUP of <math>\beta_i\!</math> as a function of the BLUE and other non-random quantities.<br />
# Describe how the BLUP is a shrinkage estimator based on the BLUE.<br />
# For large <math>n_i\!</math> and relatively fixed variances for the values of X within each cluster, how will the BLUP behave relative to the BLUE?<br />
# What are the implications for the BLUP if <math>G\!</math> is highly concentrated?<br />
=== Interpreting a factorization of G ===<br />
Consider a random coefficient model with a random intercept and random slope. Let the G matrix have the decomposition <math>G = AA'\!</math> where <br />
: <math> A = <br />
\left[ \begin{matrix}<br />
{{a}_{0}} & {{a}_{01}} \\<br />
0 & {{a}_{1}} \\<br />
\end{matrix} \right]</math><br />
<br />
Show that <math> a_0 </math> is the minimal standard deviation of random regression lines above and below the population regression line.<br />
=== Intrepreting output ===<br />
The following questions refer to the output below for a mixed model for the full high school math achievement data set. The model uses SES, SES.School which is the mean SES in the sample in each school, ‘female’ which is an individual level indicator variable and ‘Type’ which is three-level factor with levels “Coed”, “Girl” and “Boy” with the obvious definition and Sector which is a 2-level factor with levels “Catholic” and “Public”.<br />
# Consider two Catholic ‘girl’ schools, one with mean SES = 0 and the other with mean SES = 1. Suppose the values of SES in the former school range from -1 to 1 and in the latter school from 0 to 2. Draw a graph showing the predicted MathAch in these two schools over the range of values of SES in each school. On your graph identify the value and location of the contextual effect of SES, the within school effect of SES and the compositional effect of SES.<br />
# Suppose you were to refit the model without SES.School. What would you expect to happen to the coefficient for SES? Would it stay roughly the same, get bigger or get smaller, or is the change unpredictable? Explain.<br />
# Suppose you want to perform an overall test of the importance of gender in the model, either within schools or between schools. Specify a hypothesis matrix that would perform this test.<br />
# Estimate the difference between the predicted math achievement of a boy in a boys’ school versus a girl in a girls’ school. If you suspected that this is affected by the SES of the school, how would you modify the model to test this hypothesis? <br />
<br />
[[Media:MATH_6643_Sample_exam_output_2.pdf]]<br />
<br />
=== Testing identifiability of the G and R models ===<br />
Consider a mixed model for data with a response variable Y and a single predictor X. Suppose that each cluster has size 2 and X is measured at the values 0 and 1. Is it possible to identify the parameters of a random slope model? What if each cluster had size 3 and X were observed at levels -1, 0 and +1?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Submitted_sample_exam_questionsMATH 6643 Summer 2012 Applications of Mixed Models/Submitted sample exam questions2012-07-23T17:15:35Z<p>Smithce: </p>
<hr />
<div>'''Question 1:'''<br />
<br />
Consider the following output:<br />
::<br />
<pre><br />
> head(hs)<br />
school mathach ses Sex Minority Size Sector PRACAD DISCLIM<br />
1 1317 12.862 0.882 Female No 455 Catholic 0.95 -1.694<br />
2 1317 8.961 0.932 Female Yes 455 Catholic 0.95 -1.694<br />
3 1317 4.756 -0.158 Female Yes 455 Catholic 0.95 -1.694<br />
4 1317 21.405 0.362 Female Yes 455 Catholic 0.95 -1.694<br />
5 1317 20.748 1.372 Female No 455 Catholic 0.95 -1.694<br />
6 1317 18.362 0.132 Female Yes 455 Catholic 0.95 -1.694<br />
> fit <- lme( mathach ~ ses * cvar(ses,school), hs, <br />
+ random = ~ 1 + ses|school)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: hs <br />
AIC BIC logLik<br />
12846.85 12891.54 -6415.423<br />
<br />
Random effects:<br />
Formula: ~1 + ses | school<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 1.6293867 (Intr)<br />
ses 0.6614903 -0.469<br />
Residual 6.1109156 <br />
<br />
Fixed effects: mathach ~ ses * cvar(ses, school) <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 12.681917 0.3054760 1935 41.51526 0.0000<br />
ses 2.243374 0.2416545 1935 9.28339 0.0000<br />
cvar(ses, school) 3.687892 0.7699000 38 4.79009 0.0000<br />
ses:cvar(ses, school) 0.873953 0.5771829 1935 1.51417 0.1301<br />
Correlation: <br />
(Intr) ses cv(,s)<br />
ses -0.188 <br />
cvar(ses, school) 0.022 -0.261 <br />
ses:cvar(ses, school) -0.258 0.065 0.014<br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-3.2291287 -0.7433282 0.0306118 0.7770370 2.6906899 <br />
<br />
Number of Observations: 1977<br />
Number of Groups: 40 <br />
<br />
</pre><br />
<br />
:* Identify the expression of the model in mathematical formula and the usual model assumption.<br />
:* Find out the variances of the group effects, residuals and the response.<br />
:* Sketch the estimated response function for a school with mean ses of 0.<br />
:* For what value of ses is the variance of mathach estimated to be minimized.<br />
<br />
'''Question 2:'''<br />
<br />
Longitudinal data analysis with mixed models: Consider a mixed model with random intercept and slope with respect to time, T. Suppose that the G matrix is <br />
<br />
::<math><br />
\begin{bmatrix}<br />
\tau_{00} & \tau_{01} \\<br />
\tau_{10} & \tau_{11} <br />
\end{bmatrix}<br />
</math><br />
<br />
:* Find the value of T for which the variance of Y is minimized and the minimum variance. <br />
:* Show that recentering T on this value (if known) turns the G matrix into one with only two free parameters.<br />
:* Sketch a data plot to show the location and value of the minimum standard deviation of lines.<br />
<br />
'''Question 3:'''<br />
<br />
Explain why would would want to add <math>\bar{X}_j</math> or <math>SD_j</math> to a multilevel model.<br />
<br />
'''Question 4:'''<br />
<br />
Discuss the interpretation/ramifications of extreme ICC values in the design of a multilevel study.<br />
<br />
'''Question 5:'''<br />
<br />
Explain the similarities and differences between within effects, between effects, contextual effects, compositional effects; and clusters.<br />
<br />
'''Question 5:'''<br />
<br />
Discuss/explain the relationship between longitudinal data and hierarchical data with appropriate examples and theory.<br />
<br />
'''Question 6:'''<br />
<br />
Let Σ be symmetric. Show that Σ is positive-definite if and only there exists a non-singular matrix A such that Σ = AA'<br />
<br />
'''Question 7:'''<br />
<br />
Why do we study longitudinal and mixed models?<br />
<br />
=== Ryan's Questions ===<br />
<br />
'''Early Chapters/Course Content:'''<br />
<br />
What is Simpson's Paradox and how is it related to HLM? - Please describe the necessary relationships and sketch them accordingly.<br />
<br />
Under what two conditions might someone be completely unconcerned with the possible problems associated with this paradox?<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
In Chapter 11 Snijders & Bosker discuss the problem of optimal sample size in order to obtain accurate estimates of the ICC. Explain the reasoning behind the process of optimization for the hierarchical sample (assuming equal cluster sizes with a sample of size ''M'' with ''N'' clusters of size ''n'').<br />
<br />
=== Carrie's Questions ===<br />
'''Early Chapters/Course Content:'''<br />
<br />
We learned about a simulation conducted in which the estimate of the slope from a mixed model (without contextual effect) was obtained varying the within-cluster variance (see figure below). What are the implications of this simulation study?<br />
<br />
[[File:MixedModelSimResult.png|350px]]<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
Data is obtained from on the effects of two treatments (A and B). Data was recorded on symptoms weekly throughout 12 weeks of active treatment and then for 8 more weeks following termination of the treatment. A linear spline model is fit to the data and the following results were obtained:<br />
::<br />
<pre><br />
> fit <- lme( Symptom ~ sp(Weeks)*tx, random=~1+Weeks|id, data = d)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: d <br />
AIC BIC logLik<br />
15099.44 15153.18 -7539.719<br />
<br />
Random effects:<br />
Formula: ~1 + Weeks | id<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 37.921194 (Intr)<br />
Weeks 2.395329 -0.218<br />
Residual 23.285420 <br />
<br />
Fixed effects: Symptom ~ sp(Weeks) * tx <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 131.84484 6.450470 1516 20.43957 0.0000<br />
sp(Weeks)D1(0) -11.21889 0.508022 1516 -22.08346 0.0000<br />
sp(Weeks)C(10).1 18.24584 0.571104 1516 31.94835 0.0000<br />
txB -2.61621 9.122343 78 -0.28679 0.7750<br />
sp(Weeks)D1(0):txB 1.55017 0.718452 1516 2.15765 0.0311<br />
sp(Weeks)C(10).1:txB -4.83791 0.807664 1516 -5.99001 0.0000<br />
Correlation: <br />
(Intr) sp(W)D1(0) sp(W)C(10).1 txB s(W)D1(0):<br />
sp(Weeks)D1(0) -0.371 <br />
sp(Weeks)C(10).1 0.256 -0.604 <br />
txB -0.707 0.262 -0.181 <br />
sp(Weeks)D1(0):txB 0.262 -0.707 0.427 -0.371 <br />
sp(Weeks)C(10).1:txB -0.181 0.427 -0.707 0.256 -0.604 <br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-4.109811205 -0.593593851 -0.002546668 0.654503092 3.122385555 <br />
<br />
Number of Observations: 1600<br />
Number of Groups: 80 <br />
</pre><br />
<br />
:* Interpret the coefficients in the model.<br />
:* Sketch the predicted trajectories.<br />
:* Bonus: How would you test whether there is a significant difference in the predicted symptom score at week 10 or week 18?<br />
<br />
== Georges' Questions ==<br />
<br />
=== Interpreting p-values ===<br />
Consider a regression of a continuous variable Y on a continuous variable X and a dichotomous factor coded with an indicator variable G. Consider a regression of Y on X and G with:<br />
<br />
<math> \hat{Y} = \hat{\beta}_0 +\hat{\beta}_X X +\hat{\beta}_G G </math><br />
<br />
In the multiple regression of Y on X and G neither <math>\hat{\beta}_X</math> nor <math>\hat{\beta}_G</math> are significant. However the F-test for the hypothesis that both parameters are 0 yields a p-value of 0.002. <br />
<br />
Sketch a plausible data set that could exhibit this phenomenon in data space with axes for Y and X and group membership indicated by different characters. Also sketch relevant confidence ellipses in <math>\beta_X , \beta_G</math> space.<br />
<br />
=== Multilevel models ===<br />
Consider a multilevel random slopes model of the form:<br />
:<math>Y_i = X_i\gamma +Z_i u_i+ \epsilon_i\!</math><br />
with <math>u_i \sim N(0,G)\!</math>, <math>\epsilon_i \sim N(0,\sigma^2 I)\!</math> independent of <math>{{u}_{i}}\!</math>, <math>{{Z}_{i}}={{X}_{i}}\!</math>; independent for <math>i=1,\ldots ,M\!</math>.<br />
<br />
Let <br />
<math>{{\beta }_{i}}=\gamma +{{u}_{i}}\!</math>. <br />
<br />
In the following questions, treat the macro level parameters <br />
<math>\gamma ,\ {{\sigma }^{2}}\!</math> and <math>G\!</math> as they were fixed and known.<br />
# Give an expression for the BLUE of <math>\beta_i\!</math> .<br />
# Give an expression for the BLUP of <math>\beta_i\!</math> as a function of the BLUE and other non-random quantities.<br />
# Describe how the BLUP is a shrinkage estimator based on the BLUE.<br />
# For large <math>n_i\!</math> and relatively fixed variances for the values of X within each cluster, how will the BLUP behave relative to the BLUE?<br />
# What are the implications for the BLUP if <math>G\!</math> is highly concentrated?<br />
=== Interpreting a factorization of G ===<br />
Consider a random coefficient model with a random intercept and random slope. Let the G matrix have the decomposition <math>G = AA'\!</math> where <br />
: <math> A = <br />
\left[ \begin{matrix}<br />
{{a}_{0}} & {{a}_{01}} \\<br />
0 & {{a}_{1}} \\<br />
\end{matrix} \right]</math><br />
<br />
Show that <math> a_0 </math> is the minimal standard deviation of random regression lines above and below the population regression line.<br />
=== Intrepreting output ===<br />
The following questions refer to the output below for a mixed model for the full high school math achievement data set. The model uses SES, SES.School which is the mean SES in the sample in each school, ‘female’ which is an individual level indicator variable and ‘Type’ which is three-level factor with levels “Coed”, “Girl” and “Boy” with the obvious definition and Sector which is a 2-level factor with levels “Catholic” and “Public”.<br />
# Consider two Catholic ‘girl’ schools, one with mean SES = 0 and the other with mean SES = 1. Suppose the values of SES in the former school range from -1 to 1 and in the latter school from 0 to 2. Draw a graph showing the predicted MathAch in these two schools over the range of values of SES in each school. On your graph identify the value and location of the contextual effect of SES, the within school effect of SES and the compositional effect of SES.<br />
# Suppose you were to refit the model without SES.School. What would you expect to happen to the coefficient for SES? Would it stay roughly the same, get bigger or get smaller, or is the change unpredictable? Explain.<br />
# Suppose you want to perform an overall test of the importance of gender in the model, either within schools or between schools. Specify a hypothesis matrix that would perform this test.<br />
# Estimate the difference between the predicted math achievement of a boy in a boys’ school versus a girl in a girls’ school. If you suspected that this is affected by the SES of the school, how would you modify the model to test this hypothesis? <br />
<br />
[[Media:MATH_6643_Sample_exam_output_2.pdf]]<br />
<br />
=== Testing identifiability of the G and R models ===<br />
Consider a mixed model for data with a response variable Y and a single predictor X. Suppose that each cluster has size 2 and X is measured at the values 0 and 1. Is it possible to identify the parameters of a random slope model? What if each cluster had size 3 and X were observed at levels -1, 0 and +1?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-07-19T01:23:34Z<p>Smithce: </p>
<hr />
<div>== Chapter 5: The Hierarchical Linear Model ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others ===<br />
<br />
== Chapter 6: Testing and Model Specification ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others ===<br />
<br />
== Chapter 7: How Much Does the Model Explain? ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8: Heteroscedasticity ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9: Missing Data ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
=== Patterns of Missingness ===<br />
Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --[[User:Msigal|Msigal]] 08:47, 20 June 2012 (EDT)<br />
<br />
If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --[[User:Smithce|Smithce]] 09:58, 21 June 2012 (EDT)<br />
<br />
=== Missingness Assumption ===<br />
<br />
Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:31, 20 June 2012 (EDT) <br />
<br />
=== Full Maximum Likelihood vs. Imputation ===<br />
Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--[[User:Dusvat|Dusvat]] 17:27, 20 June 2012 (EDT)<br />
<br />
=== Dancing on the Bay(esian) ===<br />
<br />
Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --[[User:Rphilip2004|Rphilip2004]] 09:18, 21 June 2012 (EDT)<br />
<br />
=== Don't Throw the Kitchen Sink - ICE ===<br />
<br />
While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute?<br />
--[[User:Rbarnhar|Rbarnhar]] 09:59, 21 June 2012 (EDT)<br />
<br />
== Chapter 10: Assumptions of the Hierarchical Linear Model ==<br />
<br />
=== Influence of level-two units ===<br />
Provide a detailed explanation of how '''deletion diagnostics''' is performed and provide a practical example to illustrate it.--[[User:Gilbert8|Gilbert8]] 18:22, 24 June 2012 (EDT)<br />
<br />
=== Incorporating Descriptive Statistics ===<br />
One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --[[User:Msigal|Msigal]] 10:12, 25 June 2012 (EDT)<br />
<br />
=== Orders of the model checks ===<br />
<br />
In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 2:33, 25 June 2012 (EDT)<br />
<br />
=== Assumption violations ===<br />
In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--[[User:Dusvat|Dusvat]] 17:45, 25 June 2012 (EDT)<br />
<br />
=== Checking for Random Slopes ===<br />
In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?<br />
<br />
=== Cluster Size and Model Assumptions ===<br />
<br />
In the practical nuts and bolts of application, one at times encounters situations where the number of clusters in a requested multilevel model does not support the testable or readily examinable assumption that the random intercepts are normally distributed. How important is it that the assumption of normally distributed intercepts holds?--[[User:Rbarnhar|Rbarnhar]] 01:22, 26 June 2012 (EDT)<br />
<br />
=== Model Residuals ===<br />
<br />
The authors emphasize the importance of using OLS estimation for determining unbiased diagnostics. However, it may be useful to use model implied residuals such as <math>r = y - X \hat{\beta}</math> and <math>r.c = y - X \hat{\beta} - Z \hat{\gamma}</math>. Describe how these model implied residuals can be used to evaluate influential observations at different design levels. --[[User:Rphilip2004|Rphilip2004]] 08:33, 26 June 2012 (EDT)<br />
<br />
== Chapter 11: Designing Multilevel Studies ==<br />
<br />
=== Unequal cluster sample sizes ===<br />
<br />
Usually, we choose the same number n as the sample size of the micro-units and the same number N as the sample size of the macro-units. I want to know whether we can improve the power for testing or small standard errors through choosing the different number as the sample size in different groups? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:11, 27 June 2012 (EDT)<br />
: Comment: The main situation where it seems obvious to me that one would consider unequal cluster sizes by design would be to estimate a correlation parameter in the R matrix. --[[User:Georges|Georges]] 18:31, 28 June 2012 (EDT)<br />
<br />
=== Allocating treatment to groups or individuals ===<br />
<br />
The author mentions that multisite trial are difficult to implement as the reseercher has to make sure that they will be no contamination between the two treatment conditions within the same site or cluster randomized control may lead to selection bias. The author propose the '''pseudo-cluster randomized trial.'''Explain how this method is performed. Is this used often in practice? If this technique fails, are there other methods to resolve those situations? --[[User:Gilbert8|Gilbert8]] 15:04, 27 June 2012 (EDT)<br />
<br />
=== Treatment Allocation, Continued ===<br />
<br />
Building upon Gilbert's question, discuss the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials. Do any of these strategies match the Schizophrenia dataset that we have been working with during Lab 2? --[[User:Msigal|Msigal]] 18:51, 27 June 2012 (EDT)<br />
<br />
Continuing Matt's question, about the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials, can you give us an example of when to pick one over the other along with the cost of each trial is broken down by the cost function. Which trial is more costly? --[[User:Dusvat|Dusvat]] 23:17, 27 June 2012 (EDT)<br />
<br />
=== Power Expecting Missingness/Drop Out ===<br />
<br />
I think an interesting addition to the power analyses presented would be if we could work in estimates of drop-out/non-response within cluster. For example, if I estimated that there was only a 80% chance that I would actually get useful/complete data from each level-1 unit (lower if using the URPP ;) How might this be built into these analyses? Are there authors who have done work in this with mixed models? Is it really worth bothering with, given that power analysis is hard enough (e.g. coming up 'guesses' for estimates) as it is? --[[User:Smithce|Smithce]] 09:40, 28 June 2012 (EDT)<br />
...now that I think about it, if you approximate using constant loss across all clusters rather than trying to fool around with unbalance this is pretty easy. So, never mind.<br />
:Comment: Perhaps an easy answer but nevertheless a very good point.<br />
<br />
=== An Unknown Value of the Intraclass Correlation Coefficient ===<br />
<br />
The Authors acknowledge that the ICC is an unknown quantity, but suggest that for the Social Sciences the value tends to lie within 0.0 to 0.4. These two values have very different properties and this is made clear in the plot on the page following (p.189). The question is, not as easily answered as plotting them all as we can see from the graph the values follow different patterns of divergence. An assumed value can lead to very different optimal estimates, especially if one is wrong at the extremes. Are there any better ways to estimate the ICC a priori in order to avoid issues related to optimizing the sample size. --[[User:Rbarnhar|Rbarnhar]] 09:55, 28 June 2012 (EDT)<br />
<br />
=== Normal Distribution ===<br />
<br />
Relating to a similar question in the past, how important is it that the various levels are normally distributed when computing power estimates? --[[User:Rphilip2004|Rphilip2004]] 10:08, 28 June 2012 (EDT)<br />
:Comment: Have a look at [http://scs.math.yorku.ca/index.php?title=MATH_6643_Summer_2012_Applications_of_Mixed_Models/Power_by_simulation:_Normal_versus_t_with_5_dfs this attempt to simulate normal errors and errors with a t distribution with 5 degrees of freedom]. One conceptual problem is the concept of effect size. The t distribution with 5 dfs has a standard deviation of about 1.22. The problem is that, with its high kurtosis, your estimate of the standard deviation will tend to be lower, not in the sense of 'expectation' but in the sense of the 'typical' standard deviation. The question, then, is whether to define 'effect size' in terms of the standard deviation for the t or in the original metric. This script uses the standard deviation of the t. A quick look suggests that there isn't much change, just a slight drop in power at higher effect sizes. --[[User:Georges|Georges]] 18:24, 28 June 2012 (EDT)<br />
<br />
:Reply: Thanks! Also here is the same code utilizing multiple cores, [http://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Power_by_simulation:_Normal_versus_t_with_5_dfs-parellel in this case 4].<br />
<br />
== Chapter 12: Other Methods and Models ==<br />
<br />
=== An Application of the BIC === <br />
As mentioned in this chapter, the BIC is a good indicator of model fit, which sets a penalty based upon the number of parameters in the model. It also seems easy to calculate based upon the typical summary objects from nlme and lme4. We learned in the last chapter about dealing with influential observations. <br />
<br />
What I would like to know is: might it be appropriate to compare BICs from two models where the only difference between them is the removal of a set of observations deemed influential or problematic? Since there is no difference in the number of parameters, other measures of model fit seem inappropriate. In a practical application (working with a client), what would be the best way of approaching this situation? --[[User:Msigal|Msigal]] 14:59, 30 June 2012 (EDT)<br />
<br />
* see [http://scs.math.yorku.ca/index.php/Statistics/AIC,_BIC_and_Likelihood_Ratio_Tests some inchoate brilliant ideas]<br />
* nice to read: http://faculty.gsm.ucdavis.edu/~prasad/Abstracts/MRC_JASA.pdf <br />
=== Mixtures to Normal ===<br />
Latent class mixture models are a non-parametric way to avoid, or lessen, the assumption of normality for the random coefficients, and can approximate any distribution as the number of classes is increased. How effectively can arbitrary distributions be modeled, and should this modeling technique be used verify that the normality of the random coefficients assumptions hold? --[[User:Rphilip2004|Rphilip2004]] 19:14, 1 July 2012 (EDT)<br />
<br />
=== Sandwich estimators for standard errors ===<br />
It is mentioned that the researcher works with a misspecified model and a few reasons are given for why they do so. The Sandwich method is used to estimate standard error. Is the method always applicable for misspecified model? If the model is not misspecified, is the Sandwich estimator still used?--[[User:Gilbert8|Gilbert8]] 14:02, 2 July 2012 (EDT)<br />
<br />
In the sense that 'all models are wrong, but some are useful', are we not always using 'misspecified models'? When S&B talk about using a misspecified model intentionally, are they referring to cases in which either through statistical tests or diagnostics, we have evidence that the model fails in some regard? --[[User:Smithce|Smithce]] 22:15, 2 July 2012 (EDT)<br />
<br />
* To read: http://hubbard.berkeley.edu/cdcmultilevelcomplexdata/Gardiner2009.pdf<br />
=== BIC vs. AIC ===<br />
The authors talk about how BIC is a good indicator of model fit. However we have seen in the R code the AIC is also very close to the BIC method and similar in numbers. Though the authors don't talk about AIC,I would like to know if there is a difference when comparing models with random parts, as opposed to the simple models. Which method is better AIC or BIC? Why?--[[User:Dusvat|Dusvat]] 21:37, 2 July 2012 (EDT)<br />
<br />
===Alternative method GEE ===<br />
<br />
The authors mentioned that GEE is an alternative method to handle the multilevel data. Some people prefer GEEs because they like to have a procedure that estimating parameters in the absence of assumptions for how the coefficients vary. However, from the GLM course, I know that GEEs can used to estimate the parameters of generalized linear model with a possible unknown correlation between outcomes. Thus, can we use GEEs to handle a wild class of dataset including multilevel data? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 21:52, 2 July 2012 (EDT)<br />
<br />
=== MCMC Convergence ===<br />
<br />
The authors suggest that MCMC converges without the use of convergence diagnostics. The only related issue for consideration is time given possible dependence in the data. This would suggest that given enough time all MCMC models will converge. Is this true, and if so, does it in anyway indicate anything arbitrary/trivial or critically dependent upon prior (no pun intended) assumptions. --[[User:Rbarnhar|Rbarnhar]] 9:55, 3 July 2012 (EDT)<br />
<br />
== Chapter 13: Imperfect Hierarchies ==<br />
<br />
=== The ICC in Multiple Membership Models ===<br />
In the section on a two-level model with a crossed random factor, Snijders and Bosker discuss the various formulae for the ICC due to level (p. 209). However, in the section that builds upon this framework, where we have a multiple membership multiple classification model, there is no remark about the ICC at all. Does the calculation of the ICC change when we incorporate a multiple membership aspect to the hierarchy? --[[User:Msigal|Msigal]] 18:24, 3 July 2012 (EDT)<br />
<br />
=== Multiple membership multiple classification models ===<br />
<br />
In section 13.3, multiple membership, the author discuss about students who attends multiple school and living also in different neighborhood. A student is assigned a membership in each school attended and it is a membership weight. In the section 13.4, the author talks about when not only school membership is taken into consideration but also neighborhood membership. How will the weights be distributed in this case? Is it for school and neighborhood membership respectively or there will be a combined weight for both if applicable? --[[User:Gilbert8|Gilbert8]] 20:39, 3 July 2012 (EDT)<br />
<br />
What happens if a student is misclassified in a multiple membership multiple classification model? For example, suppose the student moves to a new neighborhood and a new school, but the new neighborhood and/or is reported incorrectly. What are the implications to the model? What happens to the weights?--[[User:Dusvat|Dusvat]] 20:35, 4 July 2012 (EDT)<br />
<br />
=== Example 13.1 Sustained primary school effects ===<br />
<br />
In example 13.1 on page 207, we compare the results of a model without (model 1) and a model with the inclusion of primary school effects (model 2). We found that the average examination grade remains the same, but there are some changes in the variance components. The primary school has a variance 0.006. It is a very small value. However, the variance is significnatly different from 0. In this situation, can we say model 2 is a better model than model 1? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 11:29, 4 July 2012 (EDT)<br />
<br />
=== Crossed Random Effects and the Problem of Snowball Sampling ===<br />
<br />
Snowball Sampling was mentioned last class and I suggested that this could be modeled as a crossed random effects model. Here now is my explanation and some added material to go with the chapter.<br />
<br />
The example given was for Aids patients. Let us retain this idea, but more elaborately expand certain assumptions and hypotheticals to make this example understandable in our context.<br />
<br />
Let us assume we are interested in following the health of certain Aids sufferers by tracking their CD4 cell counts. We are in need of a sample, but are not easily capable of obtaining one.<br />
<br />
As a group of 3 medical researchers we have access to a subset of existing Aids patients. For ease let us say we have three patients with ascending health difficulties good, fair and poor health (the assumptions here is that people of similar health status will likely have more connections than those of other stages of disease). We provide these patients with a set of tickets that are traceable back to the original owner and inform our three patients that they are to go out and recruit others for the study from wherever they can.<br />
<br />
In the meantime, we (the 3 doctors) return to our 3 treatment sites and prepare to undertake our study.<br />
<br />
Once our recruiters have returned with enough patients we randomly assign each participant to a treatment site. That means we have 3 samples that are directly nested within treatment site, but within the sample we also have people who are nested within each respective snowball. However, the distribution of snowball members is not one-to-one within the treatment sites. Given the randomization, the snowball should be unrelated to the treatment site and this means that we have one direct simple nesting structure and a second nesting structure that is randomly distributed along with the first. This process of randomization is critical. If this randomization is unrelated to the snowball then no direct connection between the 2 level 2 structures needs to be accounted for.<br />
<br />
Snijder's and Bosker have an image of exactly what I have just described on page 206. The only change here is to substitute school for treatment center and snowball for neighbourhood.<br />
<br />
There is a very clever way to deal with this problem very effectively. A trick proposed by Goldstein (1987) is to do the following:<br />
<br />
:1. Consider the entire dataset as a pseudo level 3 unit where both the snowball and the treatment centers are nested. We will need a level 3 identifier for this purpose.<br />
<br />
:2. Treat either the treatment center or snowballs as the level 2 units and specify a random intercept. <br />
<br />
:3. For the factor not chosen at step 2, specify a level 3 random intercept for each level of the factor. This requires estimating a random coefficient for a dummy coded variable representing each level.<br />
<br />
The result is that we wind up with two residual intraclass correlation coefficients.<br />
<br />
::For the Treatment Center We have:<br />
<br />
<br />
:::<math>\rho _{treatment} = \tau _{t}^{2} / (\tau _{t}^2 + \tau _{s}^2 + \sigma ^2) </math><br />
<br />
<br />
::For the Snowball We have:<br />
<br />
<br />
:::<math>\rho _{snowball} = \tau _{s}^{2} / (\tau _{s}^2 + \tau _{t}^2 + \sigma ^2) </math><br />
<br />
<br />
<br />
The solution to evaluating the similarity of the Snowballs is then to look at the ICC for Snowballs and plot and evaluate the random effects to observe whether any are unusually large. <br />
<br />
:An example of how this is declared in STATA using the xtmixed procedure is the following.<br />
<br />
::xtmixed CD4 (predictors/covariates) || _all: R.Snowball || TreatmentSite: , (options) <br />
<br />
::The _all command declares the full dataset as level 3 identified, R.Snowball automatically creates the dummy codes for Snowball<br />
<br />
:An example of how this is declared in R using the lmer procedure is the following <br />
:(Do not quote me on this though as I am still shaky with R)<br />
<br />
::lmer(CD4~(predictors/covariates)+(1|TreatmentSite)+(1|Snowball)) <br />
<br />
::Though I do not see a declaration for the level 3 identifier so this may not be exactly right.<br />
<br />
'''Goldstein, H. (1987) Multilevel Covariance Components Models. Biometrika 74:430-431.'''<br />
<br />
<br />
--[[User:Rbarnhar|Rbarnhar]] 9:53, 4 July 2012 (EDT)<br />
<br />
=== MCMC estimation ===<br />
<br />
As we move through the book and discover new topics the authors are constantly noting that advanced models can be 'easily' estimated by MCMC methods, yet they provide no examples of this type of programming --- which I immediately interpret as being 'not that easy at all' (they provide code options for R, Stata, HLM, MLwiN, Mplus, and SAS, but nothing for MCMC estimation). How easy would it be to generate this type of code, and could we construct a moderately simple example that could be run in Bugs or WinBugs that couldn't be run in lme or lmer that could be used as an introduction to Bayesian estimation? --[[User:Rphilip2004|Rphilip2004]] 22:45, 4 July 2012 (EDT)<br />
<br />
== Chapter 14: Survey Weights ==<br />
<br />
=== Model Accuracy and Design Weights ===<br />
I found the idea presented on page 226-227 interesting. It is recommended that checks are made against the design weights to see if the model differs based upon them. Snijders and Bosker then recommend divvying up the level 2 units into two or three groups, and then split the level 1 units into two or three groups. The combination of the two divisions splits the data set into 4 to nine parts, each of which can be analyzed with the hierarchical linear model.<br />
<br />
I have two concerns about this procedure: 1) How large can the discrepancies be between these parts be before we start becoming worried? and 2) At which point in the analysis would you want to do this? Does this procedure assume that we already have selected the "correct" or "true" model? --[[User:Msigal|Msigal]] 08:40, 9 July 2012 (EDT)<br />
<br />
=== Exploring the informativeness of the sampling design ===<br />
<br />
If the residuals are correlated with the design variables, then the sampling design is informative. In a informative sampling, the estimate of the parameters may be biased and inconsistent. The authors mentioned that (page 222)if we can be confident of working with a well-specified hierarchical linear model and the sample design is unrelated to the residuals, we can do the analysis and proceed as usual with estimating the hierarchical linear model. How to know a hierachical linear model is a well specificed model? <br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:20, 9 July 2012 (EDT)<br />
<br />
On page 226, the author mentions that adding design variables to model of interest allows much clear and more satistifactory analysis than only knowing the weights when the design variables are available to the researcher. Does it mean that the analysis will be wrong if they are not available? --[[User:Gilbert8|Gilbert8]] 18:54, 9 July 2012 (EDT)<br />
<br />
''' Longitudinal data analysis''' ''Italic text'' <br />
<br />
BLUE is best for resampling from the same school over and over again. The BLUP is best on average for resampling from the population of schools and students. How about the EBLUP? How the researcher chooses to use any of these? --[[User:Gilbert8|Gilbert8]] 14:36, 9 July 2012 (EDT)<br />
<br />
What is the difference between EBLUP and BLUP? Are they graphically the same? In the graphs on the slides, I didn't find any difference between BLUP and EBLUP. When do you choose one over the other? --[[User:Dusvat|Dusvat]] 22:15, 9 July 2012 (EDT)<br />
<br />
=== Weights, weights, weights .... What ones to use? .... What are the effects of my choice? ===<br />
<br />
The use of survey weights for complex data has become entrenched in many social science branches. Much like the discussion provided by Snijder's and Bosker, there remains a belief that there are really two types of weights ''sampling'' or ''survey'' weights and ''precision'' weights. There are many other weights that one should consider with HLM and in some cases combinations of them may be more revealing. The use of them however, becomes extremely complicated, numerically intensive and with no real ability to know if they are being applied correctly the issue becomes murky. The problem that lies at the heart of the issue is that THERE IS NO WAY OF KNOWING WHICH MODEL IS MORE CORRECT - WEIGHTED OR UNWEIGHTED.<br />
<br />
The choice of weights plays an extremely important role in multilevel models. In particular, the use of weights directly impacts the estimating equations depending upon the type of Likelihood inference being used. Two in particular are of interest as they dominate most software. Maximum Pseudo Maximum Likelihood (MPML) and Probability Weighted Iterative Generalized Least Squares (PWIGLS). <br />
<br />
For my simple post here I am just highlighting a problem and not fully unpacking it.<br />
<br />
'''For MPML we have the following:'''<br />
<br />
::<math> {L(y)} = \Sigma _{i=1} ^{n(L)} \omega _{i}^{L} {L} _{i}^{L} ({y}_{i}) </math> - here L refers to the specific Level of the model<br />
<br />
<br />
where the Likelihood of the observation is proportional to the weighted Likelihood by the scaled weight. This enters like a frequency weight enumerating/replicating the unit to its new value. Given this we should be aware then that the use of weights requires perfectly scaled weights for each level of representation. That means in longitudinal modelling the inverse probability weights should not be applied at anything other than the person specific level or cluster. This too is a problem though as it is not the correctly scaled weight. The weight is generally provided or deduced from a series of calculations each one assuming a specific relationship. That means most supplied weights and design schemes provide you with inappropriate weights for analysis. The effect of an incorrect weight is to create undesired bias in an undetectable and random direction.<br />
<br />
<br />
'''For PWIGLS we have the following method:''' (Sorry but it is a bit too complicated for me to put the equation in here)<br />
<br />
::After taking the partial derivatives, the population quantities in the Fisher score function are replaced by the weighted sample statistics.<br />
<br />
<br />
This procedure has shown to have fairly good properties in estimating the fixed effects parameters, but often fails to estimate the variance components effectively especially where the weights are informative. The flaw here then is that if the weights are informative we would want to use them, but we would not get good estimates of our hierarchical model. If the weights were not informative, then we would not want to bother with them anyway and just stick to regular Maximum Likelihood estimation.<br />
<br />
The use of weights is terribly complex and far from being resolved any time soon. For the time being I suggest that the use of weights is not the most effective means of modelling in the presence of complex sampling. The best use for weight is in a diagnostic role. <br />
<br />
Consider this one point, if the weights are uninformative and the model is correctly or even close to correctly specified the use of weight should change nothing except the Likelihood. Weights can help us better understand how well specified our model of interest is. Introducing weights can open insight to errors. However, one must be very careful to not let the weights themselves be the error. Know your weights and use them with caution.<br />
<br />
One last thing about weights in the longitudinal context. The weight that should be given for a level 1 observation, level 2 observation and ascending level weights, likely need to be rescaled at each and every time point, especially when missing data is present. Attrition can be in part controlled for with weights or model parameters using a ''propensity for dropout weight'' or score, that helps model the changing inclusion probabilities. Current practices suggest that weights are generally scaled for the sampling method and are not modified across time. This raises many questions regarding the results of weighted longitudinal multilevel data analysis where inverse probability weights are the norm in the social sciences.<br />
<br />
<br />
--[[User:Rbarnhar|Rbarnhar]] 20:11, 9 July 2012 (EDT)<br />
<br />
=== Example 14.1 ===<br />
In the example they say that "the observational and cross-sectional nature of the data precludes causal interpretations of the conclusions" Does this mean that the main interpretations are incorrect? if so, which ones and how are they affected by this cross-sectional nature exactly?--[[User:Dusvat|Dusvat]] 22:40, 9 July 2012 (EDT)<br />
<br />
<br />
<br />
== Chapter 15: Longitudinal Data ==<br />
<br />
=== Variable occasion designs ===<br />
It makes sense to consider models with function of time t when we analyze a longitudinal dataset. If the response is continuous, we can get some information of the function from the scatterplot between time t and the response. However, if the response is dummy variable, how to get some ideas of the function of time t in the model? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 16:13, 11 July 2012 (EDT)<br />
<br />
To obtain a good fit, you can fit more complicated random parts. One can use polynomial, splines or other functions where the covariates are fixed( constant over time). If the covariates are not fixed (changing covariates), What will be the implications on our original model? Can we still use those functions to fit the random part in this case?--[[User:Gilbert8|Gilbert8]] 18:31, 11 July 2012 (EDT)<br />
<br />
=== Contextual Variables in Longitudinal Designs ===<br />
On the bottom of page 258, there is a note about adding a level 2 contextual variable to a longitudinal design. Up until now, this has meant taking the mean of the level 1 observations for a particular cluster (e.g. mean SES for the different schools). However, "in the longitudinal case, ... including this in the model would, ..., imply an effect from events that occur in the future because, at each moment except the last, the person mean of a changing explanatory variable depends on measurements to be made at a future point in time".<br />
<br />
I have two questions about this. First, can we rephrase this to make it somewhat more meaningful? I'm not entirely clear on what it means as it is presently stated. Second, their answer to this is to include "not... the person mean but, rather, the employment situation at the start of the study period". Can we talk about what the data actually looks like for this design and how it could be modeled in R? Also, how would the model change if the covariate of job status had been continuous instead of categorical?<br />
<br />
[For reference, this model has fixed effects for: age (55 through 60), birth year, birth year x age, job status at 55 (three levels, categorical), current job status (three levels, categorical).]<br />
--[[User:Msigal|Msigal]] 17:51, 11 July 2012 (EDT)<br />
<br />
=== More on Contextual Variables ===<br />
<br />
The use of polynomial or non-linear terms like <math> time ^2 </math> are often used to create or estimate curvature in linear models. When a longitudinal analysis has both time and other monotone increasing functions, their interactions, if theoretically interpretable and meaningful, can substitute for these non-linear terms. <br />
<br />
An example would be an age by time interaction. When used with a contextual variable like mean age and a raw variable age, taken together, different aspects of the aging trajectory can be the dominant component at various times/ages. <br />
<br />
So referring to Matt's point above from S & B about the mean of a variable across time being an event in the future, I suggest that on this point S & B are not entirely accurate, unless one is moving beyond the scope of the already observed data. Yet, even then only possibly accurate. <br />
<br />
I argue there are time when S & B are not correct. If we are interested in the relations between numbers changing with meaning, then we need to understand their meaning, but also how numbers change without meaning. We need to consider a variety of interpretations and interrogate them with all extreme prejudice.<br />
<br />
In the example I supplied, the mean age can be thought of as the hinge point around which the linear and non-linear components move together. This hinge then operates as an indicator of where in the lifecourse one is rather than simply one's mean age. The other two variables are contextualized and the mean age is not an element necessarily of the future, but of one's relative point in the lifecourse. Not an as of yet unobserved variable at the start of the study.<br />
<br />
--[[User:Rbarnhar|Rbarnhar]] 22:22, 11 July 2012 (EDT)<br />
<br />
=== Autocorrelated Residuals ===<br />
The the end of the chapter they talk about the fixed occasion design and how the assumption that the level-one residuals are independent can be relaxed and replaced by the assumption of first-order autocorrelation. Can you use this in variable occasion designs? How does this affect model?<br />
They also say that other covariance and correlation patterns are possible. Can you give other examples?--[[User:Dusvat|Dusvat]] 22:00, 11 July 2012 (EDT)<br />
<br />
=== That Darned Random Part ===<br />
In Example 15.5 S&B demonstrate that the fully multivariate model has a significant deviance difference over the random slope model, but since the covariance matrix is visually similar they put it down to sample size. They prefer the random-slope model due to clearer interpretation of the random part of the model. Truth is most of the time I fit the random part to the best of my ability then otherwise ignore it! Though not particular to this chapter, if we have opportunity to discuss interpretation of the random part of HLM models a little more I'd appreciate it! --[[User:Smithce|Smithce]] 09:29, 12 July 2012 (EDT)<br />
<br />
== Chapter 16: Multivariate Multilevel Models ==<br />
<br />
=== Multivariate Random Slope ===<br />
Is there a multivariate model that had a random part at level one or is it always the case that there is never a random part at level one, regardless of how many levels the model has?<br />
Also can you show us how to model the random intercepts and slopes by group-dependent variables, since the authors mentioned that that would be the next step in the analysis. --[[User:Dusvat|Dusvat]] 18:19, 13 July 2012 (EDT)<br />
<br />
=== Again, with Model Building ===<br />
I noticed when S&B demonstrate the multivariate model that they immediately adopt a model with a level 1 interaction (IQ x SES) and a level 2 interaction (IQbar x SESbar), but no cross-level interactions. Is this just for the sake of simplicity? --[[User:Msigal|Msigal]] 20:57, 15 July 2012 (EDT)<br />
<br />
=== Multivariate empty model ===<br />
<br />
A multivariate empty model is defined as a multivariate model without explanatory variables. The author shows the covariance matrix with m=2 variables ∑=cov(Rij)(within group covariance matrix ) and T=cov(Uj) (between group covariance matrix) based on model (16.5). Will these values remain the same in case m>2? What will be the form of the model in this case?<br />
<br />
=== ML or REML ===<br />
<br />
When one wishes to analyze more than one dependent variable, it is possible to analyze then simultaneously as a multivariate dependent variable. We can also use the ML or the REML method to fit the multivariate model. We know that the two methods differ little with respect to estimating the regression coefficients, but they do differ with respect to estimating the variance components. Are there the same properties for the multivariate model? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:05, 16 July 2012 (EDT)<br />
<br />
=== Different DV Metrics ===<br />
<br />
No all DVs are continuous, and we may be interested in continuous as well as, say, dichotomous variables to be predicted. How can we modify the approach outlined in this chapter to accommodate for different classes of DVs while retaining the multivariate analysis properties? --[[User:Rphilip2004|Rphilip2004]] 17:49, 16 July 2012 (EDT)<br />
<br />
=== Multivariate Random Coefficients ===<br />
<br />
When applying the random intercept and slope model to multivariate data are we not assuming that the effect is to expected equivalent with respect to two possibly very different dependent measures. How safe is this assumption if it is in fact being made?<br />
--[[User:Rbarnhar|Rbarnhar]] 09:55, 17 July 2012 (EDT)<br />
<br />
== Chapter 17: Discrete Dependent Variables ==<br />
<br />
=== Estimation of between- and within-group variance ===<br />
<br />
For the continuous response, we can use Satterthwaite algorithm to get the approximated degrees of freedom for testing the within-group variance. The reason is that the within-group variance can be written as a linear function of mean squares. Whether the within-group variance for dichotomous outcome variables can be written as a linear function of mean squares? If the answer is yes, can we use Satterthwaite algorithm to get the approximated degrees of freedom? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 16:30, 18 July 2012 (EDT)<br />
<br />
=== Residual interclass correlation coefficient ===<br />
<br />
The interclass correlation coefficient for the multilevel logistic model can be defined by applying the definition in section 3.3 straightforwardly to be binary outcome to Yij or by applying the same definition to the unobserved underlying variable Yij hat. Do these two definitions have the same meaning when it comes to their statistical interpretation? The book mention that they lead to different outcome. How the researcher will know which one is better? --[[User:Gilbert8|Gilbert8]] 18:03, 18 July 2012 (EDT)<br />
<br />
=== Multilevel Event History Analysis ===<br />
In the multilevel event history analysis, What are the best models to choose from? Which one is better than the other? Why? Can you give examples of this type of analysis?--[[User:Dusvat|Dusvat]] 19:32, 18 July 2012 (EDT)<br />
<br />
=== Dichotomization of Ordered Categories ===<br />
Page 312 reads: "...the analyses of the dichotomized outcomes also provide insights into the fit of the model for c categories. If the estimated parameters ... depend strongly on the dichotomization point, then it is likely that the multilevel multicategory logistic or probit model does not fit well".<br />
Is this to say we should use different dichotomizing schemes as a 'check' for robustness of the model? If the 'thresholds' are consistent when the ordered categories are dichotomized in different sub-sets then we can rejoice, but if the thresholds change wildly then we should be suspicious of the multicategory model?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Submitted_sample_exam_questionsMATH 6643 Summer 2012 Applications of Mixed Models/Submitted sample exam questions2012-07-17T04:14:51Z<p>Smithce: </p>
<hr />
<div>'''Question 1:'''<br />
<br />
Consider the following output:<br />
::<br />
<pre><br />
> head(hs)<br />
school mathach ses Sex Minority Size Sector PRACAD DISCLIM<br />
1 1317 12.862 0.882 Female No 455 Catholic 0.95 -1.694<br />
2 1317 8.961 0.932 Female Yes 455 Catholic 0.95 -1.694<br />
3 1317 4.756 -0.158 Female Yes 455 Catholic 0.95 -1.694<br />
4 1317 21.405 0.362 Female Yes 455 Catholic 0.95 -1.694<br />
5 1317 20.748 1.372 Female No 455 Catholic 0.95 -1.694<br />
6 1317 18.362 0.132 Female Yes 455 Catholic 0.95 -1.694<br />
> fit <- lme( mathach ~ ses * cvar(ses,school), hs, <br />
+ random = ~ 1 + ses|school)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: hs <br />
AIC BIC logLik<br />
12846.85 12891.54 -6415.423<br />
<br />
Random effects:<br />
Formula: ~1 + ses | school<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 1.6293867 (Intr)<br />
ses 0.6614903 -0.469<br />
Residual 6.1109156 <br />
<br />
Fixed effects: mathach ~ ses * cvar(ses, school) <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 12.681917 0.3054760 1935 41.51526 0.0000<br />
ses 2.243374 0.2416545 1935 9.28339 0.0000<br />
cvar(ses, school) 3.687892 0.7699000 38 4.79009 0.0000<br />
ses:cvar(ses, school) 0.873953 0.5771829 1935 1.51417 0.1301<br />
Correlation: <br />
(Intr) ses cv(,s)<br />
ses -0.188 <br />
cvar(ses, school) 0.022 -0.261 <br />
ses:cvar(ses, school) -0.258 0.065 0.014<br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-3.2291287 -0.7433282 0.0306118 0.7770370 2.6906899 <br />
<br />
Number of Observations: 1977<br />
Number of Groups: 40 <br />
<br />
</pre><br />
<br />
:* Identify the expression of the model in mathematical formula and the usual model assumption.<br />
:* Find out the variances of the group effects, residuals and the response.<br />
:* Sketch the estimated response function for a school with mean ses of 0.<br />
:* For what value of ses is the variance of mathach estimated to be minimized.<br />
<br />
'''Question 2:'''<br />
<br />
Longitudinal data analysis with mixed models: Consider a mixed model with random intercept and slope with respect to time, T. Suppose that the G matrix is <br />
<br />
::<math><br />
\begin{bmatrix}<br />
\tau_{00} & \tau_{01} \\<br />
\tau_{10} & \tau_{11} <br />
\end{bmatrix}<br />
</math><br />
<br />
:* Find the value of T for which the variance of Y is minimized and the minimum variance. <br />
:* Show that recentering T on this value (if known) turns the G matrix into one with only two free parameters.<br />
:* Sketch a data plot to show the location and value of the minimum standard deviation of lines.<br />
<br />
'''Question 3:'''<br />
<br />
Explain why would would want to add <math>\bar{X}_j</math> or <math>SD_j</math> to a multilevel model.<br />
<br />
'''Question 4:'''<br />
<br />
Discuss the interpretation/ramifications of extreme ICC values in the design of a multilevel study.<br />
<br />
'''Question 5:'''<br />
<br />
Explain the similarities and differences between within effects, between effects and clusters.<br />
<br />
'''Question 5:'''<br />
<br />
Discuss/explain the relationship between longitudinal data and hierarchical data with appropriate examples and theory.<br />
<br />
'''Question 6:'''<br />
<br />
Let Σ be symmetric. Show that Σ is positive-definite if and only there exists a non-singular matrix A such that Σ = AA'<br />
<br />
'''Question 7:'''<br />
<br />
Why do we study longitudinal and mixed models?<br />
<br />
=== Ryan's Questions ===<br />
<br />
'''Early Chapters/Course Content:'''<br />
<br />
What is Simpson's Paradox and how is it related to HLM? - Please describe the necessary relationships and sketch them accordingly.<br />
<br />
Under what two conditions might someone be completely unconcerned with the possible problems associated with this paradox?<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
In Chapter 11 Snijders & Bosker discuss the problem of optimal sample size in order to obtain accurate estimates of the ICC. Explain the reasoning behind the process of optimization for the hierarchical sample (assuming equal cluster sizes with a sample of size ''M'' with ''N'' clusters of size ''n'').<br />
<br />
=== Carrie's Questions ===<br />
'''Early Chapters/Course Content:'''<br />
<br />
We learned about a simulation conducted in which the estimate of the slope from a mixed model (without contextual effect) was obtained varying the within-cluster variance (see figure below). What are the implications of this simulation study?<br />
<br />
[[File:MixedModelSimResult.png|350px]]<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
Data is obtained from on the effects of two treatments (A and B). Data was recorded on symptoms weekly throughout 12 weeks of active treatment and then for 8 more weeks following termination of the treatment. A linear spline model is fit to the data and the following results were obtained:<br />
::<br />
<pre><br />
> fit <- lme( Symptom ~ sp(Weeks)*tx, random=~1+Weeks|id, data = d)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: d <br />
AIC BIC logLik<br />
15099.44 15153.18 -7539.719<br />
<br />
Random effects:<br />
Formula: ~1 + Weeks | id<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 37.921194 (Intr)<br />
Weeks 2.395329 -0.218<br />
Residual 23.285420 <br />
<br />
Fixed effects: Symptom ~ sp(Weeks) * tx <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 131.84484 6.450470 1516 20.43957 0.0000<br />
sp(Weeks)D1(0) -11.21889 0.508022 1516 -22.08346 0.0000<br />
sp(Weeks)C(10).1 18.24584 0.571104 1516 31.94835 0.0000<br />
txB -2.61621 9.122343 78 -0.28679 0.7750<br />
sp(Weeks)D1(0):txB 1.55017 0.718452 1516 2.15765 0.0311<br />
sp(Weeks)C(10).1:txB -4.83791 0.807664 1516 -5.99001 0.0000<br />
Correlation: <br />
(Intr) sp(W)D1(0) sp(W)C(10).1 txB s(W)D1(0):<br />
sp(Weeks)D1(0) -0.371 <br />
sp(Weeks)C(10).1 0.256 -0.604 <br />
txB -0.707 0.262 -0.181 <br />
sp(Weeks)D1(0):txB 0.262 -0.707 0.427 -0.371 <br />
sp(Weeks)C(10).1:txB -0.181 0.427 -0.707 0.256 -0.604 <br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-4.109811205 -0.593593851 -0.002546668 0.654503092 3.122385555 <br />
<br />
Number of Observations: 1600<br />
Number of Groups: 80 <br />
</pre><br />
<br />
:* Interpret the coefficients in the model.<br />
:* Sketch the predicted trajectories.<br />
:* Bonus: How would you test whether there is a significant difference in the predicted symptom score at week 10 or week 18?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Submitted_sample_exam_questionsMATH 6643 Summer 2012 Applications of Mixed Models/Submitted sample exam questions2012-07-17T04:14:12Z<p>Smithce: </p>
<hr />
<div>'''Question 1:'''<br />
<br />
Consider the following output:<br />
::<br />
<pre><br />
> head(hs)<br />
school mathach ses Sex Minority Size Sector PRACAD DISCLIM<br />
1 1317 12.862 0.882 Female No 455 Catholic 0.95 -1.694<br />
2 1317 8.961 0.932 Female Yes 455 Catholic 0.95 -1.694<br />
3 1317 4.756 -0.158 Female Yes 455 Catholic 0.95 -1.694<br />
4 1317 21.405 0.362 Female Yes 455 Catholic 0.95 -1.694<br />
5 1317 20.748 1.372 Female No 455 Catholic 0.95 -1.694<br />
6 1317 18.362 0.132 Female Yes 455 Catholic 0.95 -1.694<br />
> fit <- lme( mathach ~ ses * cvar(ses,school), hs, <br />
+ random = ~ 1 + ses|school)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: hs <br />
AIC BIC logLik<br />
12846.85 12891.54 -6415.423<br />
<br />
Random effects:<br />
Formula: ~1 + ses | school<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 1.6293867 (Intr)<br />
ses 0.6614903 -0.469<br />
Residual 6.1109156 <br />
<br />
Fixed effects: mathach ~ ses * cvar(ses, school) <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 12.681917 0.3054760 1935 41.51526 0.0000<br />
ses 2.243374 0.2416545 1935 9.28339 0.0000<br />
cvar(ses, school) 3.687892 0.7699000 38 4.79009 0.0000<br />
ses:cvar(ses, school) 0.873953 0.5771829 1935 1.51417 0.1301<br />
Correlation: <br />
(Intr) ses cv(,s)<br />
ses -0.188 <br />
cvar(ses, school) 0.022 -0.261 <br />
ses:cvar(ses, school) -0.258 0.065 0.014<br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-3.2291287 -0.7433282 0.0306118 0.7770370 2.6906899 <br />
<br />
Number of Observations: 1977<br />
Number of Groups: 40 <br />
<br />
</pre><br />
<br />
:* Identify the expression of the model in mathematical formula and the usual model assumption.<br />
:* Find out the variances of the group effects, residuals and the response.<br />
:* Sketch the estimated response function for a school with mean ses of 0.<br />
:* For what value of ses is the variance of mathach estimated to be minimized.<br />
<br />
'''Question 2:'''<br />
<br />
Longitudinal data analysis with mixed models: Consider a mixed model with random intercept and slope with respect to time, T. Suppose that the G matrix is <br />
<br />
::<math><br />
\begin{bmatrix}<br />
\tau_{00} & \tau_{01} \\<br />
\tau_{10} & \tau_{11} <br />
\end{bmatrix}<br />
</math><br />
<br />
:* Find the value of T for which the variance of Y is minimized and the minimum variance. <br />
:* Show that recentering T on this value (if known) turns the G matrix into one with only two free parameters.<br />
:* Sketch a data plot to show the location and value of the minimum standard deviation of lines.<br />
<br />
'''Question 3:'''<br />
<br />
Explain why would would want to add <math>\bar{X}_j</math> or <math>SD_j</math> to a multilevel model.<br />
<br />
'''Question 4:'''<br />
<br />
Discuss the interpretation/ramifications of extreme ICC values in the design of a multilevel study.<br />
<br />
'''Question 5:'''<br />
<br />
Explain the similarities and differences between within effects, between effects and clusters.<br />
<br />
'''Question 5:'''<br />
<br />
Discuss/explain the relationship between longitudinal data and hierarchical data with appropriate examples and theory.<br />
<br />
'''Question 6:'''<br />
<br />
Let Σ be symmetric. Show that Σ is positive-definite if and only there exists a non-singular matrix A such that Σ = AA'<br />
<br />
'''Question 7:'''<br />
<br />
Why do we study longitudinal and mixed models?<br />
<br />
=== Ryan's Questions ===<br />
<br />
'''Early Chapters/Course Content:'''<br />
<br />
What is Simpson's Paradox and how is it related to HLM? - Please describe the necessary relationships and sketch them accordingly.<br />
<br />
Under what two conditions might someone be completely unconcerned with the possible problems associated with this paradox?<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
In Chapter 11 Snijders & Bosker discuss the problem of optimal sample size in order to obtain accurate estimates of the ICC. Explain the reasoning behind the process of optimization for the hierarchical sample (assuming equal cluster sizes with a sample of size ''M'' with ''N'' clusters of size ''n'').<br />
<br />
=== Carrie's Questions ===<br />
'''Early Chapters/Course Content:'''<br />
<br />
We learned about a simulation conducted in which the estimate of the slope from a mixed model (without contextual effect) was obtained varying the within-cluster variance. What are the implications of this simulation study?<br />
<br />
[[File:MixedModelSimResult.png|350px]]<br />
<br />
'''Later Chapters/Course Content:'''<br />
Data is obtained from on the effects of two treatments (A and B). Data was recorded on symptoms weekly throughout 12 weeks of active treatment and then for 8 more weeks following termination of the treatment. A linear spline model is fit to the data and the following results were obtained:<br />
::<br />
<pre><br />
> fit <- lme( Symptom ~ sp(Weeks)*tx, random=~1+Weeks|id, data = d)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: d <br />
AIC BIC logLik<br />
15099.44 15153.18 -7539.719<br />
<br />
Random effects:<br />
Formula: ~1 + Weeks | id<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 37.921194 (Intr)<br />
Weeks 2.395329 -0.218<br />
Residual 23.285420 <br />
<br />
Fixed effects: Symptom ~ sp(Weeks) * tx <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 131.84484 6.450470 1516 20.43957 0.0000<br />
sp(Weeks)D1(0) -11.21889 0.508022 1516 -22.08346 0.0000<br />
sp(Weeks)C(10).1 18.24584 0.571104 1516 31.94835 0.0000<br />
txB -2.61621 9.122343 78 -0.28679 0.7750<br />
sp(Weeks)D1(0):txB 1.55017 0.718452 1516 2.15765 0.0311<br />
sp(Weeks)C(10).1:txB -4.83791 0.807664 1516 -5.99001 0.0000<br />
Correlation: <br />
(Intr) sp(W)D1(0) sp(W)C(10).1 txB s(W)D1(0):<br />
sp(Weeks)D1(0) -0.371 <br />
sp(Weeks)C(10).1 0.256 -0.604 <br />
txB -0.707 0.262 -0.181 <br />
sp(Weeks)D1(0):txB 0.262 -0.707 0.427 -0.371 <br />
sp(Weeks)C(10).1:txB -0.181 0.427 -0.707 0.256 -0.604 <br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-4.109811205 -0.593593851 -0.002546668 0.654503092 3.122385555 <br />
<br />
Number of Observations: 1600<br />
Number of Groups: 80 <br />
</pre><br />
<br />
:* Interpret the coefficients in the model.<br />
:* Sketch the predicted trajectories.<br />
:* Bonus: How would you test whether there is a significant difference in the predicted symptom score at week 10 or week 18?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Submitted_sample_exam_questionsMATH 6643 Summer 2012 Applications of Mixed Models/Submitted sample exam questions2012-07-17T04:13:41Z<p>Smithce: </p>
<hr />
<div>'''Question 1:'''<br />
<br />
Consider the following output:<br />
::<br />
<pre><br />
> head(hs)<br />
school mathach ses Sex Minority Size Sector PRACAD DISCLIM<br />
1 1317 12.862 0.882 Female No 455 Catholic 0.95 -1.694<br />
2 1317 8.961 0.932 Female Yes 455 Catholic 0.95 -1.694<br />
3 1317 4.756 -0.158 Female Yes 455 Catholic 0.95 -1.694<br />
4 1317 21.405 0.362 Female Yes 455 Catholic 0.95 -1.694<br />
5 1317 20.748 1.372 Female No 455 Catholic 0.95 -1.694<br />
6 1317 18.362 0.132 Female Yes 455 Catholic 0.95 -1.694<br />
> fit <- lme( mathach ~ ses * cvar(ses,school), hs, <br />
+ random = ~ 1 + ses|school)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: hs <br />
AIC BIC logLik<br />
12846.85 12891.54 -6415.423<br />
<br />
Random effects:<br />
Formula: ~1 + ses | school<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 1.6293867 (Intr)<br />
ses 0.6614903 -0.469<br />
Residual 6.1109156 <br />
<br />
Fixed effects: mathach ~ ses * cvar(ses, school) <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 12.681917 0.3054760 1935 41.51526 0.0000<br />
ses 2.243374 0.2416545 1935 9.28339 0.0000<br />
cvar(ses, school) 3.687892 0.7699000 38 4.79009 0.0000<br />
ses:cvar(ses, school) 0.873953 0.5771829 1935 1.51417 0.1301<br />
Correlation: <br />
(Intr) ses cv(,s)<br />
ses -0.188 <br />
cvar(ses, school) 0.022 -0.261 <br />
ses:cvar(ses, school) -0.258 0.065 0.014<br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-3.2291287 -0.7433282 0.0306118 0.7770370 2.6906899 <br />
<br />
Number of Observations: 1977<br />
Number of Groups: 40 <br />
<br />
</pre><br />
<br />
:* Identify the expression of the model in mathematical formula and the usual model assumption.<br />
:* Find out the variances of the group effects, residuals and the response.<br />
:* Sketch the estimated response function for a school with mean ses of 0.<br />
:* For what value of ses is the variance of mathach estimated to be minimized.<br />
<br />
'''Question 2:'''<br />
<br />
Longitudinal data analysis with mixed models: Consider a mixed model with random intercept and slope with respect to time, T. Suppose that the G matrix is <br />
<br />
::<math><br />
\begin{bmatrix}<br />
\tau_{00} & \tau_{01} \\<br />
\tau_{10} & \tau_{11} <br />
\end{bmatrix}<br />
</math><br />
<br />
:* Find the value of T for which the variance of Y is minimized and the minimum variance. <br />
:* Show that recentering T on this value (if known) turns the G matrix into one with only two free parameters.<br />
:* Sketch a data plot to show the location and value of the minimum standard deviation of lines.<br />
<br />
'''Question 3:'''<br />
<br />
Explain why would would want to add <math>\bar{X}_j</math> or <math>SD_j</math> to a multilevel model.<br />
<br />
'''Question 4:'''<br />
<br />
Discuss the interpretation/ramifications of extreme ICC values in the design of a multilevel study.<br />
<br />
'''Question 5:'''<br />
<br />
Explain the similarities and differences between within effects, between effects and clusters.<br />
<br />
'''Question 5:'''<br />
<br />
Discuss/explain the relationship between longitudinal data and hierarchical data with appropriate examples and theory.<br />
<br />
'''Question 6:'''<br />
<br />
Let Σ be symmetric. Show that Σ is positive-definite if and only there exists a non-singular matrix A such that Σ = AA'<br />
<br />
'''Question 7:'''<br />
<br />
Why do we study longitudinal and mixed models?<br />
<br />
=== Ryan's Questions ===<br />
<br />
'''Early Chapters/Course Content:'''<br />
<br />
What is Simpson's Paradox and how is it related to HLM? - Please describe the necessary relationships and sketch them accordingly.<br />
<br />
Under what two conditions might someone be completely unconcerned with the possible problems associated with this paradox?<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
In Chapter 11 Snijders & Bosker discuss the problem of optimal sample size in order to obtain accurate estimates of the ICC. Explain the reasoning behind the process of optimization for the hierarchical sample (assuming equal cluster sizes with a sample of size ''M'' with ''N'' clusters of size ''n'').<br />
<br />
=== Carrie's Questions ===<br />
'''Early Chapters/Course Content:'''<br />
<br />
We learned about a simulation conducted in which the estimate of the slope from a mixed model (without contextual effect) was obtained varying the within-cluster variance. What are the implications of this simulation study?<br />
[[File:MixedModelSimResult.png|450px]]<br />
<br />
'''Later Chapters/Course Content:'''<br />
Data is obtained from on the effects of two treatments (A and B). Data was recorded on symptoms weekly throughout 12 weeks of active treatment and then for 8 more weeks following termination of the treatment. A linear spline model is fit to the data and the following results were obtained:<br />
::<br />
<pre><br />
> fit <- lme( Symptom ~ sp(Weeks)*tx, random=~1+Weeks|id, data = d)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: d <br />
AIC BIC logLik<br />
15099.44 15153.18 -7539.719<br />
<br />
Random effects:<br />
Formula: ~1 + Weeks | id<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 37.921194 (Intr)<br />
Weeks 2.395329 -0.218<br />
Residual 23.285420 <br />
<br />
Fixed effects: Symptom ~ sp(Weeks) * tx <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 131.84484 6.450470 1516 20.43957 0.0000<br />
sp(Weeks)D1(0) -11.21889 0.508022 1516 -22.08346 0.0000<br />
sp(Weeks)C(10).1 18.24584 0.571104 1516 31.94835 0.0000<br />
txB -2.61621 9.122343 78 -0.28679 0.7750<br />
sp(Weeks)D1(0):txB 1.55017 0.718452 1516 2.15765 0.0311<br />
sp(Weeks)C(10).1:txB -4.83791 0.807664 1516 -5.99001 0.0000<br />
Correlation: <br />
(Intr) sp(W)D1(0) sp(W)C(10).1 txB s(W)D1(0):<br />
sp(Weeks)D1(0) -0.371 <br />
sp(Weeks)C(10).1 0.256 -0.604 <br />
txB -0.707 0.262 -0.181 <br />
sp(Weeks)D1(0):txB 0.262 -0.707 0.427 -0.371 <br />
sp(Weeks)C(10).1:txB -0.181 0.427 -0.707 0.256 -0.604 <br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-4.109811205 -0.593593851 -0.002546668 0.654503092 3.122385555 <br />
<br />
Number of Observations: 1600<br />
Number of Groups: 80 <br />
</pre><br />
<br />
:* Interpret the coefficients in the model.<br />
:* Sketch the predicted trajectories.<br />
:* Bonus: How would you test whether there is a significant difference in the predicted symptom score at week 10 or week 18?</div>Smithcehttp://scs.math.yorku.ca/index.php/File:MixedModelSimResult.pngFile:MixedModelSimResult.png2012-07-17T04:07:49Z<p>Smithce: MATH 6643 Summer 2012
Image taken from slides Mixed Models with R: Longitudinal Data Analysis with Mixed Models
describing result of simulation study for mixed model slope coefficient (without contextual variable)</p>
<hr />
<div>MATH 6643 Summer 2012<br />
Image taken from slides Mixed Models with R: Longitudinal Data Analysis with Mixed Models<br />
describing result of simulation study for mixed model slope coefficient (without contextual variable)</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Submitted_sample_exam_questionsMATH 6643 Summer 2012 Applications of Mixed Models/Submitted sample exam questions2012-07-17T03:38:23Z<p>Smithce: </p>
<hr />
<div>'''Question 1:'''<br />
<br />
Consider the following output:<br />
::<br />
<pre><br />
> head(hs)<br />
school mathach ses Sex Minority Size Sector PRACAD DISCLIM<br />
1 1317 12.862 0.882 Female No 455 Catholic 0.95 -1.694<br />
2 1317 8.961 0.932 Female Yes 455 Catholic 0.95 -1.694<br />
3 1317 4.756 -0.158 Female Yes 455 Catholic 0.95 -1.694<br />
4 1317 21.405 0.362 Female Yes 455 Catholic 0.95 -1.694<br />
5 1317 20.748 1.372 Female No 455 Catholic 0.95 -1.694<br />
6 1317 18.362 0.132 Female Yes 455 Catholic 0.95 -1.694<br />
> fit <- lme( mathach ~ ses * cvar(ses,school), hs, <br />
+ random = ~ 1 + ses|school)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: hs <br />
AIC BIC logLik<br />
12846.85 12891.54 -6415.423<br />
<br />
Random effects:<br />
Formula: ~1 + ses | school<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 1.6293867 (Intr)<br />
ses 0.6614903 -0.469<br />
Residual 6.1109156 <br />
<br />
Fixed effects: mathach ~ ses * cvar(ses, school) <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 12.681917 0.3054760 1935 41.51526 0.0000<br />
ses 2.243374 0.2416545 1935 9.28339 0.0000<br />
cvar(ses, school) 3.687892 0.7699000 38 4.79009 0.0000<br />
ses:cvar(ses, school) 0.873953 0.5771829 1935 1.51417 0.1301<br />
Correlation: <br />
(Intr) ses cv(,s)<br />
ses -0.188 <br />
cvar(ses, school) 0.022 -0.261 <br />
ses:cvar(ses, school) -0.258 0.065 0.014<br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-3.2291287 -0.7433282 0.0306118 0.7770370 2.6906899 <br />
<br />
Number of Observations: 1977<br />
Number of Groups: 40 <br />
<br />
</pre><br />
<br />
:* Identify the expression of the model in mathematical formula and the usual model assumption.<br />
:* Find out the variances of the group effects, residuals and the response.<br />
:* Sketch the estimated response function for a school with mean ses of 0.<br />
:* For what value of ses is the variance of mathach estimated to be minimized.<br />
<br />
'''Question 2:'''<br />
<br />
Longitudinal data analysis with mixed models: Consider a mixed model with random intercept and slope with respect to time, T. Suppose that the G matrix is <br />
<br />
::<math><br />
\begin{bmatrix}<br />
\tau_{00} & \tau_{01} \\<br />
\tau_{10} & \tau_{11} <br />
\end{bmatrix}<br />
</math><br />
<br />
:* Find the value of T for which the variance of Y is minimized and the minimum variance. <br />
:* Show that recentering T on this value (if known) turns the G matrix into one with only two free parameters.<br />
:* Sketch a data plot to show the location and value of the minimum standard deviation of lines.<br />
<br />
'''Question 3:'''<br />
<br />
Explain why would would want to add <math>\bar{X}_j</math> or <math>SD_j</math> to a multilevel model.<br />
<br />
'''Question 4:'''<br />
<br />
Discuss the interpretation/ramifications of extreme ICC values in the design of a multilevel study.<br />
<br />
'''Question 5:'''<br />
<br />
Explain the similarities and differences between within effects, between effects and clusters.<br />
<br />
'''Question 5:'''<br />
<br />
Discuss/explain the relationship between longitudinal data and hierarchical data with appropriate examples and theory.<br />
<br />
'''Question 6:'''<br />
<br />
Let Σ be symmetric. Show that Σ is positive-definite if and only there exists a non-singular matrix A such that Σ = AA'<br />
<br />
'''Question 7:'''<br />
<br />
Why do we study longitudinal and mixed models?<br />
<br />
=== Ryan's Questions ===<br />
<br />
'''Early Chapters/Course Content:'''<br />
<br />
What is Simpson's Paradox and how is it related to HLM? - Please describe the necessary relationships and sketch them accordingly.<br />
<br />
Under what two conditions might someone be completely unconcerned with the possible problems associated with this paradox?<br />
<br />
'''Later Chapters/Course Content:'''<br />
<br />
In Chapter 11 Snijders & Bosker discuss the problem of optimal sample size in order to obtain accurate estimates of the ICC. Explain the reasoning behind the process of optimization for the hierarchical sample (assuming equal cluster sizes with a sample of size ''M'' with ''N'' clusters of size ''n'').<br />
<br />
=== Carrie's Questions ===<br />
'''Early Chapters/Course Content:'''<br />
<br />
<br />
'''Later Chapters/Course Content:'''<br />
Data is obtained from on the effects of two treatments (A and B). Data was recorded on symptoms weekly throughout 12 weeks of active treatment and then for 8 more weeks following termination of the treatment. A linear spline model is fit to the data and the following results were obtained:<br />
::<br />
<pre><br />
> fit <- lme( Symptom ~ sp(Weeks)*tx, random=~1+Weeks|id, data = d)<br />
> summary(fit)<br />
Linear mixed-effects model fit by REML<br />
Data: d <br />
AIC BIC logLik<br />
15099.44 15153.18 -7539.719<br />
<br />
Random effects:<br />
Formula: ~1 + Weeks | id<br />
Structure: General positive-definite, Log-Cholesky parametrization<br />
StdDev Corr <br />
(Intercept) 37.921194 (Intr)<br />
Weeks 2.395329 -0.218<br />
Residual 23.285420 <br />
<br />
Fixed effects: Symptom ~ sp(Weeks) * tx <br />
Value Std.Error DF t-value p-value<br />
(Intercept) 131.84484 6.450470 1516 20.43957 0.0000<br />
sp(Weeks)D1(0) -11.21889 0.508022 1516 -22.08346 0.0000<br />
sp(Weeks)C(10).1 18.24584 0.571104 1516 31.94835 0.0000<br />
txB -2.61621 9.122343 78 -0.28679 0.7750<br />
sp(Weeks)D1(0):txB 1.55017 0.718452 1516 2.15765 0.0311<br />
sp(Weeks)C(10).1:txB -4.83791 0.807664 1516 -5.99001 0.0000<br />
Correlation: <br />
(Intr) sp(W)D1(0) sp(W)C(10).1 txB s(W)D1(0):<br />
sp(Weeks)D1(0) -0.371 <br />
sp(Weeks)C(10).1 0.256 -0.604 <br />
txB -0.707 0.262 -0.181 <br />
sp(Weeks)D1(0):txB 0.262 -0.707 0.427 -0.371 <br />
sp(Weeks)C(10).1:txB -0.181 0.427 -0.707 0.256 -0.604 <br />
<br />
Standardized Within-Group Residuals:<br />
Min Q1 Med Q3 Max <br />
-4.109811205 -0.593593851 -0.002546668 0.654503092 3.122385555 <br />
<br />
Number of Observations: 1600<br />
Number of Groups: 80 <br />
</pre><br />
<br />
:* Interpret the coefficients in the model.<br />
:* Sketch the predicted trajectories.<br />
:* Bonus: How would you test whether there is a significant difference in the predicted symptom score at week 10 or week 18?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-07-12T13:29:53Z<p>Smithce: </p>
<hr />
<div>== Chapter 5: The Hierarchical Linear Model ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others ===<br />
<br />
== Chapter 6: Testing and Model Specification ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others ===<br />
<br />
== Chapter 7: How Much Does the Model Explain? ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8: Heteroscedasticity ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9: Missing Data ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
=== Patterns of Missingness ===<br />
Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --[[User:Msigal|Msigal]] 08:47, 20 June 2012 (EDT)<br />
<br />
If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --[[User:Smithce|Smithce]] 09:58, 21 June 2012 (EDT)<br />
<br />
=== Missingness Assumption ===<br />
<br />
Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:31, 20 June 2012 (EDT) <br />
<br />
=== Full Maximum Likelihood vs. Imputation ===<br />
Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--[[User:Dusvat|Dusvat]] 17:27, 20 June 2012 (EDT)<br />
<br />
=== Dancing on the Bay(esian) ===<br />
<br />
Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --[[User:Rphilip2004|Rphilip2004]] 09:18, 21 June 2012 (EDT)<br />
<br />
=== Don't Throw the Kitchen Sink - ICE ===<br />
<br />
While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute?<br />
--[[User:Rbarnhar|Rbarnhar]] 09:59, 21 June 2012 (EDT)<br />
<br />
== Chapter 10: Assumptions of the Hierarchical Linear Model ==<br />
<br />
=== Influence of level-two units ===<br />
Provide a detailed explanation of how '''deletion diagnostics''' is performed and provide a practical example to illustrate it.--[[User:Gilbert8|Gilbert8]] 18:22, 24 June 2012 (EDT)<br />
<br />
=== Incorporating Descriptive Statistics ===<br />
One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --[[User:Msigal|Msigal]] 10:12, 25 June 2012 (EDT)<br />
<br />
=== Orders of the model checks ===<br />
<br />
In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 2:33, 25 June 2012 (EDT)<br />
<br />
=== Assumption violations ===<br />
In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--[[User:Dusvat|Dusvat]] 17:45, 25 June 2012 (EDT)<br />
<br />
=== Checking for Random Slopes ===<br />
In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?<br />
<br />
=== Cluster Size and Model Assumptions ===<br />
<br />
In the practical nuts and bolts of application, one at times encounters situations where the number of clusters in a requested multilevel model does not support the testable or readily examinable assumption that the random intercepts are normally distributed. How important is it that the assumption of normally distributed intercepts holds?--[[User:Rbarnhar|Rbarnhar]] 01:22, 26 June 2012 (EDT)<br />
<br />
=== Model Residuals ===<br />
<br />
The authors emphasize the importance of using OLS estimation for determining unbiased diagnostics. However, it may be useful to use model implied residuals such as <math>r = y - X \hat{\beta}</math> and <math>r.c = y - X \hat{\beta} - Z \hat{\gamma}</math>. Describe how these model implied residuals can be used to evaluate influential observations at different design levels. --[[User:Rphilip2004|Rphilip2004]] 08:33, 26 June 2012 (EDT)<br />
<br />
== Chapter 11: Designing Multilevel Studies ==<br />
<br />
=== Unequal cluster sample sizes ===<br />
<br />
Usually, we choose the same number n as the sample size of the micro-units and the same number N as the sample size of the macro-units. I want to know whether we can improve the power for testing or small standard errors through choosing the different number as the sample size in different groups? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:11, 27 June 2012 (EDT)<br />
: Comment: The main situation where it seems obvious to me that one would consider unequal cluster sizes by design would be to estimate a correlation parameter in the R matrix. --[[User:Georges|Georges]] 18:31, 28 June 2012 (EDT)<br />
<br />
=== Allocating treatment to groups or individuals ===<br />
<br />
The author mentions that multisite trial are difficult to implement as the reseercher has to make sure that they will be no contamination between the two treatment conditions within the same site or cluster randomized control may lead to selection bias. The author propose the '''pseudo-cluster randomized trial.'''Explain how this method is performed. Is this used often in practice? If this technique fails, are there other methods to resolve those situations? --[[User:Gilbert8|Gilbert8]] 15:04, 27 June 2012 (EDT)<br />
<br />
=== Treatment Allocation, Continued ===<br />
<br />
Building upon Gilbert's question, discuss the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials. Do any of these strategies match the Schizophrenia dataset that we have been working with during Lab 2? --[[User:Msigal|Msigal]] 18:51, 27 June 2012 (EDT)<br />
<br />
Continuing Matt's question, about the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials, can you give us an example of when to pick one over the other along with the cost of each trial is broken down by the cost function. Which trial is more costly? --[[User:Dusvat|Dusvat]] 23:17, 27 June 2012 (EDT)<br />
<br />
=== Power Expecting Missingness/Drop Out ===<br />
<br />
I think an interesting addition to the power analyses presented would be if we could work in estimates of drop-out/non-response within cluster. For example, if I estimated that there was only a 80% chance that I would actually get useful/complete data from each level-1 unit (lower if using the URPP ;) How might this be built into these analyses? Are there authors who have done work in this with mixed models? Is it really worth bothering with, given that power analysis is hard enough (e.g. coming up 'guesses' for estimates) as it is? --[[User:Smithce|Smithce]] 09:40, 28 June 2012 (EDT)<br />
...now that I think about it, if you approximate using constant loss across all clusters rather than trying to fool around with unbalance this is pretty easy. So, never mind.<br />
:Comment: Perhaps an easy answer but nevertheless a very good point.<br />
<br />
=== An Unknown Value of the Intraclass Correlation Coefficient ===<br />
<br />
The Authors acknowledge that the ICC is an unknown quantity, but suggest that for the Social Sciences the value tends to lie within 0.0 to 0.4. These two values have very different properties and this is made clear in the plot on the page following (p.189). The question is, not as easily answered as plotting them all as we can see from the graph the values follow different patterns of divergence. An assumed value can lead to very different optimal estimates, especially if one is wrong at the extremes. Are there any better ways to estimate the ICC a priori in order to avoid issues related to optimizing the sample size. --[[User:Rbarnhar|Rbarnhar]] 09:55, 28 June 2012 (EDT)<br />
<br />
=== Normal Distribution ===<br />
<br />
Relating to a similar question in the past, how important is it that the various levels are normally distributed when computing power estimates? --[[User:Rphilip2004|Rphilip2004]] 10:08, 28 June 2012 (EDT)<br />
:Comment: Have a look at [http://scs.math.yorku.ca/index.php?title=MATH_6643_Summer_2012_Applications_of_Mixed_Models/Power_by_simulation:_Normal_versus_t_with_5_dfs this attempt to simulate normal errors and errors with a t distribution with 5 degrees of freedom]. One conceptual problem is the concept of effect size. The t distribution with 5 dfs has a standard deviation of about 1.22. The problem is that, with its high kurtosis, your estimate of the standard deviation will tend to be lower, not in the sense of 'expectation' but in the sense of the 'typical' standard deviation. The question, then, is whether to define 'effect size' in terms of the standard deviation for the t or in the original metric. This script uses the standard deviation of the t. A quick look suggests that there isn't much change, just a slight drop in power at higher effect sizes. --[[User:Georges|Georges]] 18:24, 28 June 2012 (EDT)<br />
<br />
:Reply: Thanks! Also here is the same code utilizing multiple cores, [http://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Power_by_simulation:_Normal_versus_t_with_5_dfs-parellel in this case 4].<br />
<br />
== Chapter 12: Other Methods and Models ==<br />
<br />
=== An Application of the BIC === <br />
As mentioned in this chapter, the BIC is a good indicator of model fit, which sets a penalty based upon the number of parameters in the model. It also seems easy to calculate based upon the typical summary objects from nlme and lme4. We learned in the last chapter about dealing with influential observations. <br />
<br />
What I would like to know is: might it be appropriate to compare BICs from two models where the only difference between them is the removal of a set of observations deemed influential or problematic? Since there is no difference in the number of parameters, other measures of model fit seem inappropriate. In a practical application (working with a client), what would be the best way of approaching this situation? --[[User:Msigal|Msigal]] 14:59, 30 June 2012 (EDT)<br />
<br />
* see [http://scs.math.yorku.ca/index.php/Statistics/AIC,_BIC_and_Likelihood_Ratio_Tests some inchoate brilliant ideas]<br />
* nice to read: http://faculty.gsm.ucdavis.edu/~prasad/Abstracts/MRC_JASA.pdf <br />
=== Mixtures to Normal ===<br />
Latent class mixture models are a non-parametric way to avoid, or lessen, the assumption of normality for the random coefficients, and can approximate any distribution as the number of classes is increased. How effectively can arbitrary distributions be modeled, and should this modeling technique be used verify that the normality of the random coefficients assumptions hold? --[[User:Rphilip2004|Rphilip2004]] 19:14, 1 July 2012 (EDT)<br />
<br />
=== Sandwich estimators for standard errors ===<br />
It is mentioned that the researcher works with a misspecified model and a few reasons are given for why they do so. The Sandwich method is used to estimate standard error. Is the method always applicable for misspecified model? If the model is not misspecified, is the Sandwich estimator still used?--[[User:Gilbert8|Gilbert8]] 14:02, 2 July 2012 (EDT)<br />
<br />
In the sense that 'all models are wrong, but some are useful', are we not always using 'misspecified models'? When S&B talk about using a misspecified model intentionally, are they referring to cases in which either through statistical tests or diagnostics, we have evidence that the model fails in some regard? --[[User:Smithce|Smithce]] 22:15, 2 July 2012 (EDT)<br />
<br />
* To read: http://hubbard.berkeley.edu/cdcmultilevelcomplexdata/Gardiner2009.pdf<br />
=== BIC vs. AIC ===<br />
The authors talk about how BIC is a good indicator of model fit. However we have seen in the R code the AIC is also very close to the BIC method and similar in numbers. Though the authors don't talk about AIC,I would like to know if there is a difference when comparing models with random parts, as opposed to the simple models. Which method is better AIC or BIC? Why?--[[User:Dusvat|Dusvat]] 21:37, 2 July 2012 (EDT)<br />
<br />
===Alternative method GEE ===<br />
<br />
The authors mentioned that GEE is an alternative method to handle the multilevel data. Some people prefer GEEs because they like to have a procedure that estimating parameters in the absence of assumptions for how the coefficients vary. However, from the GLM course, I know that GEEs can used to estimate the parameters of generalized linear model with a possible unknown correlation between outcomes. Thus, can we use GEEs to handle a wild class of dataset including multilevel data? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 21:52, 2 July 2012 (EDT)<br />
<br />
=== MCMC Convergence ===<br />
<br />
The authors suggest that MCMC converges without the use of convergence diagnostics. The only related issue for consideration is time given possible dependence in the data. This would suggest that given enough time all MCMC models will converge. Is this true, and if so, does it in anyway indicate anything arbitrary/trivial or critically dependent upon prior (no pun intended) assumptions. --[[User:Rbarnhar|Rbarnhar]] 9:55, 3 July 2012 (EDT)<br />
<br />
== Chapter 13: Imperfect Hierarchies ==<br />
<br />
=== The ICC in Multiple Membership Models ===<br />
In the section on a two-level model with a crossed random factor, Snijders and Bosker discuss the various formulae for the ICC due to level (p. 209). However, in the section that builds upon this framework, where we have a multiple membership multiple classification model, there is no remark about the ICC at all. Does the calculation of the ICC change when we incorporate a multiple membership aspect to the hierarchy? --[[User:Msigal|Msigal]] 18:24, 3 July 2012 (EDT)<br />
<br />
=== Multiple membership multiple classification models ===<br />
<br />
In section 13.3, multiple membership, the author discuss about students who attends multiple school and living also in different neighborhood. A student is assigned a membership in each school attended and it is a membership weight. In the section 13.4, the author talks about when not only school membership is taken into consideration but also neighborhood membership. How will the weights be distributed in this case? Is it for school and neighborhood membership respectively or there will be a combined weight for both if applicable? --[[User:Gilbert8|Gilbert8]] 20:39, 3 July 2012 (EDT)<br />
<br />
What happens if a student is misclassified in a multiple membership multiple classification model? For example, suppose the student moves to a new neighborhood and a new school, but the new neighborhood and/or is reported incorrectly. What are the implications to the model? What happens to the weights?--[[User:Dusvat|Dusvat]] 20:35, 4 July 2012 (EDT)<br />
<br />
=== Example 13.1 Sustained primary school effects ===<br />
<br />
In example 13.1 on page 207, we compare the results of a model without (model 1) and a model with the inclusion of primary school effects (model 2). We found that the average examination grade remains the same, but there are some changes in the variance components. The primary school has a variance 0.006. It is a very small value. However, the variance is significnatly different from 0. In this situation, can we say model 2 is a better model than model 1? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 11:29, 4 July 2012 (EDT)<br />
<br />
=== Crossed Random Effects and the Problem of Snowball Sampling ===<br />
<br />
Snowball Sampling was mentioned last class and I suggested that this could be modeled as a crossed random effects model. Here now is my explanation and some added material to go with the chapter.<br />
<br />
The example given was for Aids patients. Let us retain this idea, but more elaborately expand certain assumptions and hypotheticals to make this example understandable in our context.<br />
<br />
Let us assume we are interested in following the health of certain Aids sufferers by tracking their CD4 cell counts. We are in need of a sample, but are not easily capable of obtaining one.<br />
<br />
As a group of 3 medical researchers we have access to a subset of existing Aids patients. For ease let us say we have three patients with ascending health difficulties good, fair and poor health (the assumptions here is that people of similar health status will likely have more connections than those of other stages of disease). We provide these patients with a set of tickets that are traceable back to the original owner and inform our three patients that they are to go out and recruit others for the study from wherever they can.<br />
<br />
In the meantime, we (the 3 doctors) return to our 3 treatment sites and prepare to undertake our study.<br />
<br />
Once our recruiters have returned with enough patients we randomly assign each participant to a treatment site. That means we have 3 samples that are directly nested within treatment site, but within the sample we also have people who are nested within each respective snowball. However, the distribution of snowball members is not one-to-one within the treatment sites. Given the randomization, the snowball should be unrelated to the treatment site and this means that we have one direct simple nesting structure and a second nesting structure that is randomly distributed along with the first. This process of randomization is critical. If this randomization is unrelated to the snowball then no direct connection between the 2 level 2 structures needs to be accounted for.<br />
<br />
Snijder's and Bosker have an image of exactly what I have just described on page 206. The only change here is to substitute school for treatment center and snowball for neighbourhood.<br />
<br />
There is a very clever way to deal with this problem very effectively. A trick proposed by Goldstein (1987) is to do the following:<br />
<br />
:1. Consider the entire dataset as a pseudo level 3 unit where both the snowball and the treatment centers are nested. We will need a level 3 identifier for this purpose.<br />
<br />
:2. Treat either the treatment center or snowballs as the level 2 units and specify a random intercept. <br />
<br />
:3. For the factor not chosen at step 2, specify a level 3 random intercept for each level of the factor. This requires estimating a random coefficient for a dummy coded variable representing each level.<br />
<br />
The result is that we wind up with two residual intraclass correlation coefficients.<br />
<br />
::For the Treatment Center We have:<br />
<br />
<br />
:::<math>\rho _{treatment} = \tau _{t}^{2} / (\tau _{t}^2 + \tau _{s}^2 + \sigma ^2) </math><br />
<br />
<br />
::For the Snowball We have:<br />
<br />
<br />
:::<math>\rho _{snowball} = \tau _{s}^{2} / (\tau _{s}^2 + \tau _{t}^2 + \sigma ^2) </math><br />
<br />
<br />
<br />
The solution to evaluating the similarity of the Snowballs is then to look at the ICC for Snowballs and plot and evaluate the random effects to observe whether any are unusually large. <br />
<br />
:An example of how this is declared in STATA using the xtmixed procedure is the following.<br />
<br />
::xtmixed CD4 (predictors/covariates) || _all: R.Snowball || TreatmentSite: , (options) <br />
<br />
::The _all command declares the full dataset as level 3 identified, R.Snowball automatically creates the dummy codes for Snowball<br />
<br />
:An example of how this is declared in R using the lmer procedure is the following <br />
:(Do not quote me on this though as I am still shaky with R)<br />
<br />
::lmer(CD4~(predictors/covariates)+(1|TreatmentSite)+(1|Snowball)) <br />
<br />
::Though I do not see a declaration for the level 3 identifier so this may not be exactly right.<br />
<br />
'''Goldstein, H. (1987) Multilevel Covariance Components Models. Biometrika 74:430-431.'''<br />
<br />
<br />
--[[User:Rbarnhar|Rbarnhar]] 9:53, 4 July 2012 (EDT)<br />
<br />
=== MCMC estimation ===<br />
<br />
As we move through the book and discover new topics the authors are constantly noting that advanced models can be 'easily' estimated by MCMC methods, yet they provide no examples of this type of programming --- which I immediately interpret as being 'not that easy at all' (they provide code options for R, Stata, HLM, MLwiN, Mplus, and SAS, but nothing for MCMC estimation). How easy would it be to generate this type of code, and could we construct a moderately simple example that could be run in Bugs or WinBugs that couldn't be run in lme or lmer that could be used as an introduction to Bayesian estimation? --[[User:Rphilip2004|Rphilip2004]] 22:45, 4 July 2012 (EDT)<br />
<br />
== Chapter 14: Survey Weights ==<br />
<br />
=== Model Accuracy and Design Weights ===<br />
I found the idea presented on page 226-227 interesting. It is recommended that checks are made against the design weights to see if the model differs based upon them. Snijders and Bosker then recommend divvying up the level 2 units into two or three groups, and then split the level 1 units into two or three groups. The combination of the two divisions splits the data set into 4 to nine parts, each of which can be analyzed with the hierarchical linear model.<br />
<br />
I have two concerns about this procedure: 1) How large can the discrepancies be between these parts be before we start becoming worried? and 2) At which point in the analysis would you want to do this? Does this procedure assume that we already have selected the "correct" or "true" model? --[[User:Msigal|Msigal]] 08:40, 9 July 2012 (EDT)<br />
<br />
=== Exploring the informativeness of the sampling design ===<br />
<br />
If the residuals are correlated with the design variables, then the sampling design is informative. In a informative sampling, the estimate of the parameters may be biased and inconsistent. The authors mentioned that (page 222)if we can be confident of working with a well-specified hierarchical linear model and the sample design is unrelated to the residuals, we can do the analysis and proceed as usual with estimating the hierarchical linear model. How to know a hierachical linear model is a well specificed model? <br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:20, 9 July 2012 (EDT)<br />
<br />
On page 226, the author mentions that adding design variables to model of interest allows much clear and more satistifactory analysis than only knowing the weights when the design variables are available to the researcher. Does it mean that the analysis will be wrong if they are not available? --[[User:Gilbert8|Gilbert8]] 18:54, 9 July 2012 (EDT)<br />
<br />
''' Longitudinal data analysis''' ''Italic text'' <br />
<br />
BLUE is best for resampling from the same school over and over again. The BLUP is best on average for resampling from the population of schools and students. How about the EBLUP? How the researcher chooses to use any of these? --[[User:Gilbert8|Gilbert8]] 14:36, 9 July 2012 (EDT)<br />
<br />
What is the difference between EBLUP and BLUP? Are they graphically the same? In the graphs on the slides, I didn't find any difference between BLUP and EBLUP. When do you choose one over the other? --[[User:Dusvat|Dusvat]] 22:15, 9 July 2012 (EDT)<br />
<br />
=== Weights, weights, weights .... What ones to use? .... What are the effects of my choice? ===<br />
<br />
The use of survey weights for complex data has become entrenched in many social science branches. Much like the discussion provided by Snijder's and Bosker, there remains a belief that there are really two types of weights ''sampling'' or ''survey'' weights and ''precision'' weights. There are many other weights that one should consider with HLM and in some cases combinations of them may be more revealing. The use of them however, becomes extremely complicated, numerically intensive and with no real ability to know if they are being applied correctly the issue becomes murky. The problem that lies at the heart of the issue is that THERE IS NO WAY OF KNOWING WHICH MODEL IS MORE CORRECT - WEIGHTED OR UNWEIGHTED.<br />
<br />
The choice of weights plays an extremely important role in multilevel models. In particular, the use of weights directly impacts the estimating equations depending upon the type of Likelihood inference being used. Two in particular are of interest as they dominate most software. Maximum Pseudo Maximum Likelihood (MPML) and Probability Weighted Iterative Generalized Least Squares (PWIGLS). <br />
<br />
For my simple post here I am just highlighting a problem and not fully unpacking it.<br />
<br />
'''For MPML we have the following:'''<br />
<br />
::<math> {L(y)} = \Sigma _{i=1} ^{n(L)} \omega _{i}^{L} {L} _{i}^{L} ({y}_{i}) </math> - here L refers to the specific Level of the model<br />
<br />
<br />
where the Likelihood of the observation is proportional to the weighted Likelihood by the scaled weight. This enters like a frequency weight enumerating/replicating the unit to its new value. Given this we should be aware then that the use of weights requires perfectly scaled weights for each level of representation. That means in longitudinal modelling the inverse probability weights should not be applied at anything other than the person specific level or cluster. This too is a problem though as it is not the correctly scaled weight. The weight is generally provided or deduced from a series of calculations each one assuming a specific relationship. That means most supplied weights and design schemes provide you with inappropriate weights for analysis. The effect of an incorrect weight is to create undesired bias in an undetectable and random direction.<br />
<br />
<br />
'''For PWIGLS we have the following method:''' (Sorry but it is a bit too complicated for me to put the equation in here)<br />
<br />
::After taking the partial derivatives, the population quantities in the Fisher score function are replaced by the weighted sample statistics.<br />
<br />
<br />
This procedure has shown to have fairly good properties in estimating the fixed effects parameters, but often fails to estimate the variance components effectively especially where the weights are informative. The flaw here then is that if the weights are informative we would want to use them, but we would not get good estimates of our hierarchical model. If the weights were not informative, then we would not want to bother with them anyway and just stick to regular Maximum Likelihood estimation.<br />
<br />
The use of weights is terribly complex and far from being resolved any time soon. For the time being I suggest that the use of weights is not the most effective means of modelling in the presence of complex sampling. The best use for weight is in a diagnostic role. <br />
<br />
Consider this one point, if the weights are uninformative and the model is correctly or even close to correctly specified the use of weight should change nothing except the Likelihood. Weights can help us better understand how well specified our model of interest is. Introducing weights can open insight to errors. However, one must be very careful to not let the weights themselves be the error. Know your weights and use them with caution.<br />
<br />
One last thing about weights in the longitudinal context. The weight that should be given for a level 1 observation, level 2 observation and ascending level weights, likely need to be rescaled at each and every time point, especially when missing data is present. Attrition can be in part controlled for with weights or model parameters using a ''propensity for dropout weight'' or score, that helps model the changing inclusion probabilities. Current practices suggest that weights are generally scaled for the sampling method and are not modified across time. This raises many questions regarding the results of weighted longitudinal multilevel data analysis where inverse probability weights are the norm in the social sciences.<br />
<br />
<br />
--[[User:Rbarnhar|Rbarnhar]] 20:11, 9 July 2012 (EDT)<br />
<br />
=== Example 14.1 ===<br />
In the example they say that "the observational and cross-sectional nature of the data precludes causal interpretations of the conclusions" Does this mean that the main interpretations are incorrect? if so, which ones and how are they affected by this cross-sectional nature exactly?--[[User:Dusvat|Dusvat]] 22:40, 9 July 2012 (EDT)<br />
<br />
<br />
<br />
== Chapter 15: Longitudinal Data ==<br />
<br />
=== Variable occasion designs ===<br />
It makes sense to consider models with function of time t when we analyze a longitudinal dataset. If the response is continuous, we can get some information of the function from the scatterplot between time t and the response. However, if the response is dummy variable, how to get some ideas of the function of time t in the model? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 16:13, 11 July 2012 (EDT)<br />
<br />
To fit a random part of the model, one can use polynomial, splines or other functions where the covariates are fixed.If the covariates are not fixed (changing covariates), What will be the implications on our original model? Can we still use those functions to fit the random part in this case?--[[User:Gilbert8|Gilbert8]] 18:31, 11 July 2012 (EDT)<br />
<br />
=== Contextual Variables in Longitudinal Designs ===<br />
On the bottom of page 258, there is a note about adding a level 2 contextual variable to a longitudinal design. Up until now, this has meant taking the mean of the level 1 observations for a particular cluster (e.g. mean SES for the different schools). However, "in the longitudinal case, ... including this in the model would, ..., imply an effect from events that occur in the future because, at each moment except the last, the person mean of a changing explanatory variable depends on measurements to be made at a future point in time".<br />
<br />
I have two questions about this. First, can we rephrase this to make it somewhat more meaningful? I'm not entirely clear on what it means as it is presently stated. Second, their answer to this is to include "not... the person mean but, rather, the employment situation at the start of the study period". Can we talk about what the data actually looks like for this design and how it could be modeled in R? Also, how would the model change if the covariate of job status had been continuous instead of categorical?<br />
<br />
[For reference, this model has fixed effects for: age (55 through 60), birth year, birth year x age, job status at 55 (three levels, categorical), current job status (three levels, categorical).]<br />
--[[User:Msigal|Msigal]] 17:51, 11 July 2012 (EDT)<br />
<br />
=== More on Contextual Variables ===<br />
<br />
The use of polynomial or non-linear terms like <math> time ^2 </math> are often used to create or estimate curvature in linear models. When a longitudinal analysis has both time and other monotone increasing functions, their interactions, if theoretically interpretable and meaningful, can substitute for these non-linear terms. <br />
<br />
An example would be an age by time interaction. When used with a contextual variable like mean age and a raw variable age, taken together, different aspects of the aging trajectory can be the dominant component at various times/ages. <br />
<br />
So referring to Matt's point above from S & B about the mean of a variable across time being an event in the future, I suggest that on this point S & B are not entirely accurate, unless one is moving beyond the scope of the already observed data. Yet, even then only possibly accurate. <br />
<br />
I argue there are time when S & B are not correct. If we are interested in the relations between numbers changing with meaning, then we need to understand their meaning, but also how numbers change without meaning. We need to consider a variety of interpretations and interrogate them with all extreme prejudice.<br />
<br />
In the example I supplied, the mean age can be thought of as the hinge point around which the linear and non-linear components move together. This hinge then operates as an indicator of where in the lifecourse one is rather than simply one's mean age. The other two variables are contextualized and the mean age is not an element necessarily of the future, but of one's relative point in the lifecourse. Not an as of yet unobserved variable at the start of the study.<br />
<br />
--[[User:Rbarnhar|Rbarnhar]] 22:22, 11 July 2012 (EDT)<br />
<br />
=== Autocorrelated Residuals ===<br />
The the end of the chapter they talk about the fixed occasion design and how the assumption that the level-one residuals are independent can be relaxed and replaced by the assumption of first-order autocorrelation. Can you use this in variable occasion designs? How does this affect model?<br />
They also say that other covariance and correlation patterns are possible. Can you give other examples?--[[User:Dusvat|Dusvat]] 22:00, 11 July 2012 (EDT)<br />
<br />
=== That Darned Random Part ===<br />
In Example 15.5 S&B demonstrate that the fully multivariate model has a significant deviance difference over the random slope model, but since the covariance matrix is visually similar they put it down to sample size. They prefer the random-slope model due to clearer interpretation of the random part of the model. Truth is most of the time I fit the random part to the best of my ability then otherwise ignore it! Though not particular to this chapter, if we have opportunity to discuss interpretation of the random part of HLM models a little more I'd appreciate it! --[[User:Smithce|Smithce]] 09:29, 12 July 2012 (EDT)<br />
<br />
== Chapter 16: Multivariate Multilevel Models ==<br />
== Chapter 17: Discrete Dependent Variables ==<br />
== Chapter 18: Software ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-07-03T02:15:33Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8 ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9 ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
=== Patterns of Missingness ===<br />
Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --[[User:Msigal|Msigal]] 08:47, 20 June 2012 (EDT)<br />
<br />
If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --[[User:Smithce|Smithce]] 09:58, 21 June 2012 (EDT)<br />
<br />
=== Missingness Assumption ===<br />
<br />
Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:31, 20 June 2012 (EDT) <br />
<br />
=== Full Maximum Likelihood vs. Imputation ===<br />
Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--[[User:Dusvat|Dusvat]] 17:27, 20 June 2012 (EDT)<br />
<br />
=== Dancing on the Bay(esian) ===<br />
<br />
Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --[[User:Rphilip2004|Rphilip2004]] 09:18, 21 June 2012 (EDT)<br />
<br />
=== Don't Throw the Kitchen Sink - ICE ===<br />
<br />
While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute?<br />
--[[User:Rbarnhar|Rbarnhar]] 09:59, 21 June 2012 (EDT)<br />
<br />
== Chapter 10 ==<br />
<br />
=== Influence of level-two units ===<br />
Provide a detailed explanation of how '''deletion diagnostics''' is performed and provide a practical example to illustrate it.--[[User:Gilbert8|Gilbert8]] 18:22, 24 June 2012 (EDT)<br />
<br />
=== Incorporating Descriptive Statistics ===<br />
One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --[[User:Msigal|Msigal]] 10:12, 25 June 2012 (EDT)<br />
<br />
=== Orders of the model checks ===<br />
<br />
In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 2:33, 25 June 2012 (EDT)<br />
<br />
=== Assumption violations ===<br />
In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--[[User:Dusvat|Dusvat]] 17:45, 25 June 2012 (EDT)<br />
<br />
=== Checking for Random Slopes ===<br />
In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?<br />
<br />
=== Cluster Size and Model Assumptions ===<br />
<br />
In the practical nuts and bolts of application, one at times encounters situations where the number of clusters in a requested multilevel model does not support the testable or readily examinable assumption that the random intercepts are normally distributed. How important is it that the assumption of normally distributed intercepts holds?--[[User:Rbarnhar|Rbarnhar]] 01:22, 26 June 2012 (EDT)<br />
<br />
=== Model Residuals ===<br />
<br />
The authors emphasize the importance of using OLS estimation for determining unbiased diagnostics. However, it may be useful to use model implied residuals such as <math>r = y - X \hat{\beta}</math> and <math>r.c = y - X \hat{\beta} - Z \hat{\gamma}</math>. Describe how these model implied residuals can be used to evaluate influential observations at different design levels. --[[User:Rphilip2004|Rphilip2004]] 08:33, 26 June 2012 (EDT)<br />
<br />
== Chapter 11 ==<br />
<br />
=== Unequal cluster sample sizes ===<br />
<br />
Usually, we choose the same number n as the sample size of the micro-units and the same number N as the sample size of the macro-units. I want to know whether we can improve the power for testing or small standard errors through choosing the different number as the sample size in different groups? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:11, 27 June 2012 (EDT)<br />
: Comment: The main situation where it seems obvious to me that one would consider unequal cluster sizes by design would be to estimate a correlation parameter in the R matrix. --[[User:Georges|Georges]] 18:31, 28 June 2012 (EDT)<br />
<br />
=== Allocating treatment to groups or individuals ===<br />
<br />
The author mentions that multisite trial are difficult to implement as the reseercher has to make sure that they will be no contamination between the two treatment conditions within the same site or cluster randomized control may lead to selection bias. The author propose the '''pseudo-cluster randomized trial.'''Explain how this method is performed. Is this used often in practice? If this technique fails, are there other methods to resolve those situations? --[[User:Gilbert8|Gilbert8]] 15:04, 27 June 2012 (EDT)<br />
<br />
=== Treatment Allocation, Continued ===<br />
<br />
Building upon Gilbert's question, discuss the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials. Do any of these strategies match the Schizophrenia dataset that we have been working with during Lab 2? --[[User:Msigal|Msigal]] 18:51, 27 June 2012 (EDT)<br />
<br />
Continuing Matt's question, about the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials, can you give us an example of when to pick one over the other along with the cost of each trial is broken down by the cost function. Which trial is more costly? --[[User:Dusvat|Dusvat]] 23:17, 27 June 2012 (EDT)<br />
<br />
=== Power Expecting Missingness/Drop Out ===<br />
<br />
I think an interesting addition to the power analyses presented would be if we could work in estimates of drop-out/non-response within cluster. For example, if I estimated that there was only a 80% chance that I would actually get useful/complete data from each level-1 unit (lower if using the URPP ;) How might this be built into these analyses? Are there authors who have done work in this with mixed models? Is it really worth bothering with, given that power analysis is hard enough (e.g. coming up 'guesses' for estimates) as it is? --[[User:Smithce|Smithce]] 09:40, 28 June 2012 (EDT)<br />
...now that I think about it, if you approximate using constant loss across all clusters rather than trying to fool around with unbalance this is pretty easy. So, never mind.<br />
:Comment: Perhaps an easy answer but nevertheless a very good point.<br />
<br />
=== An Unknown Value of the Intraclass Correlation Coefficient ===<br />
<br />
The Authors acknowledge that the ICC is an unknown quantity, but suggest that for the Social Sciences the value tends to lie within 0.0 to 0.4. These two values have very different properties and this is made clear in the plot on the page following (p.189). The question is, not as easily answered as plotting them all as we can see from the graph the values follow different patterns of divergence. An assumed value can lead to very different optimal estimates, especially if one is wrong at the extremes. Are there any better ways to estimate the ICC a priori in order to avoid issues related to optimizing the sample size. --[[User:Rbarnhar|Rbarnhar]] 09:55, 28 June 2012 (EDT)<br />
<br />
=== Normal Distribution ===<br />
<br />
Relating to a similar question in the past, how important is it that the various levels are normally distributed when computing power estimates? --[[User:Rphilip2004|Rphilip2004]] 10:08, 28 June 2012 (EDT)<br />
:Comment: Have a look at [http://scs.math.yorku.ca/index.php?title=MATH_6643_Summer_2012_Applications_of_Mixed_Models/Power_by_simulation:_Normal_versus_t_with_5_dfs this attempt to simulate normal errors and errors with a t distribution with 5 degrees of freedom]. One conceptual problem is the concept of effect size. The t distribution with 5 dfs has a standard deviation of about 1.22. The problem is that, with its high kurtosis, your estimate of the standard deviation will tend to be lower, not in the sense of 'expectation' but in the sense of the 'typical' standard deviation. The question, then, is whether to define 'effect size' in terms of the standard deviation for the t or in the original metric. This script uses the standard deviation of the t. A quick look suggests that there isn't much change, just a slight drop in power at higher effect sizes. --[[User:Georges|Georges]] 18:24, 28 June 2012 (EDT)<br />
<br />
:Reply: Thanks! Also here is the same code utilizing multiple cores, [http://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Power_by_simulation:_Normal_versus_t_with_5_dfs-parellel in this case 4].<br />
<br />
== Chapter 12 ==<br />
<br />
=== An Application of the BIC === <br />
As mentioned in this chapter, the BIC is a good indicator of model fit, which sets a penalty based upon the number of parameters in the model. It also seems easy to calculate based upon the typical summary objects from nlme and lme4. We learned in the last chapter about dealing with influential observations. <br />
<br />
What I would like to know is: might it be appropriate to compare BICs from two models where the only difference between them is the removal of a set of observations deemed influential or problematic? Since there is no difference in the number of parameters, other measures of model fit seem inappropriate. In a practical application (working with a client), what would be the best way of approaching this situation? --[[User:Msigal|Msigal]] 14:59, 30 June 2012 (EDT)<br />
<br />
=== Mixtures to Normal ===<br />
Latent class mixture models are a non-parametric way to avoid, or lesson, the assumption of normality for the random coefficients, and can approximate any distribution as the number of classes is increased. How effective can arbitrary distributions be modeled, and should this modeling technique be used verify that the normality of the random coefficients assumptions hold? --[[User:Rphilip2004|Rphilip2004]] 19:14, 1 July 2012 (EDT)<br />
<br />
<br />
=== Sandwich estimators for standard errors ===<br />
It is mentioned that the researcher works with a misspecified model and a few reasons are given for why they do so. The Sandwich method is used to estimate standard error. Is the method always applicable for misspecified model? If the model is not misspecified, is the Sandwich estimator still used?--[[User:Gilbert8|Gilbert8]] 14:02, 2 July 2012 (EDT)<br />
<br />
In the sense that 'all models are wrong, but some are useful', are we not always using 'misspecified models'? When S&B talk about using a misspecified model intentionally, are they referring to cases in which either through statistical tests or diagnostics, we have evidence that the model fails in some regard? --[[User:Smithce|Smithce]] 22:15, 2 July 2012 (EDT)<br />
<br />
=== BIC vs. AIC ===<br />
The authors talk about how BIC is a good indicator of model fit. However we have seen in the R code the AIC is also very close to the BIC method and similar in numbers. Though the authors don't talk about AIC,I would like to know if there is a difference when comparing models with random parts, as opposed to the simple models. Which method is better AIC or BIC? Why?--[[User:Dusvat|Dusvat]] 21:37, 2 July 2012 (EDT)<br />
<br />
===Alternative method GEE ===<br />
<br />
The authors mentioned that GEE is an alternative method to handle the multilevel data. Some people prefer GEEs because they like to have a procedure that estimating parameters in the absence of assumptions for how the coefficients vary. However, from the GLM course, I know that GEEs can used to estimate the parameters of generalized linear model with a possible unknown correlation between outcomes. Thus, can we use GEEs to handle a wild class of dataset including multilevel data? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 9:52, 2 July 2012 (EDT)<br />
<br />
<br />
== Chapter 13 ==<br />
== Chapter 14 ==<br />
== Chapter 15 ==<br />
== Chapter 16 ==<br />
== Chapter 17 ==<br />
== Chapter 18 ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-28T13:40:52Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8 ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9 ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
2. Discuss an example with missing data set . Provide R code using multiple imputation to complete data set.<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
=== Patterns of Missingness ===<br />
Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --[[User:Msigal|Msigal]] 08:47, 20 June 2012 (EDT)<br />
<br />
If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --[[User:Smithce|Smithce]] 09:58, 21 June 2012 (EDT)<br />
<br />
=== Missingness Assumption ===<br />
<br />
Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:31, 20 June 2012 (EDT) <br />
<br />
=== Full Maximum Likelihood vs. Imputation ===<br />
Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--[[User:Dusvat|Dusvat]] 17:27, 20 June 2012 (EDT)<br />
<br />
=== Dancing on the Bay(esian) ===<br />
<br />
Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --[[User:Rphilip2004|Rphilip2004]] 09:18, 21 June 2012 (EDT)<br />
<br />
=== Don't Throw the Kitchen Sink - ICE ===<br />
<br />
While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute?<br />
--[[User:Rbarnhar|Rbarnhar]] 09:59, 21 June 2012 (EDT)<br />
<br />
== Chapter 10 ==<br />
<br />
=== Influence of level-two units ===<br />
Provide a detailed explanation of how '''deletion diagnostics''' is performed and provide a practical example to illustrate it.--[[User:Gilbert8|Gilbert8]] 18:22, 24 June 2012 (EDT)<br />
<br />
=== Incorporating Descriptive Statistics ===<br />
One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --[[User:Msigal|Msigal]] 10:12, 25 June 2012 (EDT)<br />
<br />
=== Orders of the model checks ===<br />
<br />
In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 2:33, 25 June 2012 (EDT)<br />
<br />
=== Assumption violations ===<br />
In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--[[User:Dusvat|Dusvat]] 17:45, 25 June 2012 (EDT)<br />
<br />
=== Checking for Random Slopes ===<br />
In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?<br />
<br />
=== Cluster Size and Model Assumptions ===<br />
<br />
In the practical nuts and bolts of application, one at times encounters situations where the number of clusters in a requested multilevel model does not support the testable or readily examinable assumption that the random intercepts are normally distributed. How important is it that the assumption of normally distributed intercepts holds?--[[User:Rbarnhar|Rbarnhar]] 01:22, 26 June 2012 (EDT)<br />
<br />
=== Model Residuals ===<br />
<br />
The authors emphasize the importance of using OLS estimation for determining unbiased diagnostics. However, it may be useful to use model implied residuals such as <math>r = y - X \hat{\beta}</math> and <math>r.c = y - X \hat{\beta} - Z \hat{\gamma}</math>. Describe how these model implied residuals can be used to evaluate influential observations at different design levels. --[[User:Rphilip2004|Rphilip2004]] 08:33, 26 June 2012 (EDT)<br />
<br />
== Chapter 11 ==<br />
<br />
=== Orders of the model checks ===<br />
<br />
Usually, we choose the same number n as the sample size of the micro-units and the same number N as the sample size of the macro-units. I want to know whether we can improve the power for testing or small standard errors through choosing the different number as the sample size in different groups? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:11, 27 June 2012 (EDT)<br />
<br />
=== Allocating treatment to groups or individuals ===<br />
<br />
The author mention that multisite trial are difficult to implement as the reseercher has to make sure that they will be no contamination between the two treatment conditions within the same site or cluster randomized control may lead to selection bias. The author propose the '''pseudo-cluster randomized trial.'''Explain how this method is performed. Is this used often in practice? If this technique fails, are there other methods to resolve those situations? --[[User:Gilbert8|Gilbert8]] 15:04, 27 June 2012 (EDT)<br />
<br />
=== Treatment Allocation, Continued ===<br />
<br />
Building upon Gilbert's question, discuss the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials. Do any of these strategies match the Schizophrenia dataset that we have been working with during Lab 2? --[[User:Msigal|Msigal]] 18:51, 27 June 2012 (EDT)<br />
<br />
Continuing Matt's question, about the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials, can you give us an example of when to pick one over the other along with the cost of each trial is broken down by the cost function. Which trial is more costly? --[[User:Dusvat|Dusvat]] 23:17, 27 June 2012 (EDT)<br />
<br />
=== Power Expecting Missingness/Drop Out ===<br />
<br />
I think an interesting addition to the power analyses presented would be if we could work in estimates of drop-out/non-response within cluster. For example, if I estimated that there was only a 80% chance that I would actually get useful/complete data from each level-1 unit (lower if using the URPP ;) How might this be built into these analyses? Are there authors who have done work in this with mixed models? Is it really worth bothering with, given that power analysis is hard enough (e.g. coming up 'guesses' for estimates) as it is? --[[User:Smithce|Smithce]] 09:40, 28 June 2012 (EDT)<br />
...now that I think about it, if you approximate using constant loss across all clusters rather than trying to fool around with unbalance this is pretty easy. So, never mind.<br />
<br />
== Chapter 12 ==<br />
== Chapter 13 ==<br />
== Chapter 14 ==<br />
== Chapter 15 ==<br />
== Chapter 16 ==<br />
== Chapter 17 ==<br />
== Chapter 18 ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-28T13:40:15Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8 ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9 ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
2. Discuss an example with missing data set . Provide R code using multiple imputation to complete data set.<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
=== Patterns of Missingness ===<br />
Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --[[User:Msigal|Msigal]] 08:47, 20 June 2012 (EDT)<br />
<br />
If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --[[User:Smithce|Smithce]] 09:58, 21 June 2012 (EDT)<br />
<br />
=== Missingness Assumption ===<br />
<br />
Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:31, 20 June 2012 (EDT) <br />
<br />
=== Full Maximum Likelihood vs. Imputation ===<br />
Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--[[User:Dusvat|Dusvat]] 17:27, 20 June 2012 (EDT)<br />
<br />
=== Dancing on the Bay(esian) ===<br />
<br />
Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --[[User:Rphilip2004|Rphilip2004]] 09:18, 21 June 2012 (EDT)<br />
<br />
=== Don't Throw the Kitchen Sink - ICE ===<br />
<br />
While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute?<br />
--[[User:Rbarnhar|Rbarnhar]] 09:59, 21 June 2012 (EDT)<br />
<br />
== Chapter 10 ==<br />
<br />
=== Influence of level-two units ===<br />
Provide a detailed explanation of how '''deletion diagnostics''' is performed and provide a practical example to illustrate it.--[[User:Gilbert8|Gilbert8]] 18:22, 24 June 2012 (EDT)<br />
<br />
=== Incorporating Descriptive Statistics ===<br />
One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --[[User:Msigal|Msigal]] 10:12, 25 June 2012 (EDT)<br />
<br />
=== Orders of the model checks ===<br />
<br />
In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 2:33, 25 June 2012 (EDT)<br />
<br />
=== Assumption violations ===<br />
In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--[[User:Dusvat|Dusvat]] 17:45, 25 June 2012 (EDT)<br />
<br />
=== Checking for Random Slopes ===<br />
In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?<br />
<br />
=== Cluster Size and Model Assumptions ===<br />
<br />
In the practical nuts and bolts of application, one at times encounters situations where the number of clusters in a requested multilevel model does not support the testable or readily examinable assumption that the random intercepts are normally distributed. How important is it that the assumption of normally distributed intercepts holds?--[[User:Rbarnhar|Rbarnhar]] 01:22, 26 June 2012 (EDT)<br />
<br />
=== Model Residuals ===<br />
<br />
The authors emphasize the importance of using OLS estimation for determining unbiased diagnostics. However, it may be useful to use model implied residuals such as <math>r = y - X \hat{\beta}</math> and <math>r.c = y - X \hat{\beta} - Z \hat{\gamma}</math>. Describe how these model implied residuals can be used to evaluate influential observations at different design levels. --[[User:Rphilip2004|Rphilip2004]] 08:33, 26 June 2012 (EDT)<br />
<br />
== Chapter 11 ==<br />
<br />
=== Orders of the model checks ===<br />
<br />
Usually, we choose the same number n as the sample size of the micro-units and the same number N as the sample size of the macro-units. I want to know whether we can improve the power for testing or small standard errors through choosing the different number as the sample size in different groups? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:11, 27 June 2012 (EDT)<br />
<br />
=== Allocating treatment to groups or individuals ===<br />
<br />
The author mention that multisite trial are difficult to implement as the reseercher has to make sure that they will be no contamination between the two treatment conditions within the same site or cluster randomized control may lead to selection bias. The author propose the '''pseudo-cluster randomized trial.'''Explain how this method is performed. Is this used often in practice? If this technique fails, are there other methods to resolve those situations? --[[User:Gilbert8|Gilbert8]] 15:04, 27 June 2012 (EDT)<br />
<br />
=== Treatment Allocation, Continued ===<br />
<br />
Building upon Gilbert's question, discuss the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials. Do any of these strategies match the Schizophrenia dataset that we have been working with during Lab 2? --[[User:Msigal|Msigal]] 18:51, 27 June 2012 (EDT)<br />
<br />
Continuing Matt's question, about the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials, can you give us an example of when to pick one over the other along with the cost of each trial is broken down by the cost function. Which trial is more costly? --[[User:Dusvat|Dusvat]] 23:17, 27 June 2012 (EDT)<br />
<br />
=== Power Expecting Missingness/Drop Out ===<br />
<br />
I think an interesting addition to the power analyses presented would be if we could work in estimates of drop-out/non-response within cluster. For example, if I estimated that there was only a 80% chance that I would actually get useful/complete data from each level-1 unit (lower if using the URPP ;) How might this be built into these analyses? Are there authors who have done work in this with mixed models? Is it really worth bothering with, given that power analysis is hard enough (e.g. coming up 'guesses' for estimates) as it is? --[[User:Smithce|Smithce]] 09:40, 28 June 2012 (EDT)<br />
...now that I think about it, if you assume constant loss rather than trying to fool around with unbalance this is pretty easy. So, never mind.<br />
<br />
== Chapter 12 ==<br />
== Chapter 13 ==<br />
== Chapter 14 ==<br />
== Chapter 15 ==<br />
== Chapter 16 ==<br />
== Chapter 17 ==<br />
== Chapter 18 ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-28T13:14:07Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8 ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9 ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
2. Discuss an example with missing data set . Provide R code using multiple imputation to complete data set.<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
=== Patterns of Missingness ===<br />
Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --[[User:Msigal|Msigal]] 08:47, 20 June 2012 (EDT)<br />
<br />
If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --[[User:Smithce|Smithce]] 09:58, 21 June 2012 (EDT)<br />
<br />
=== Missingness Assumption ===<br />
<br />
Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:31, 20 June 2012 (EDT) <br />
<br />
=== Full Maximum Likelihood vs. Imputation ===<br />
Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--[[User:Dusvat|Dusvat]] 17:27, 20 June 2012 (EDT)<br />
<br />
=== Dancing on the Bay(esian) ===<br />
<br />
Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --[[User:Rphilip2004|Rphilip2004]] 09:18, 21 June 2012 (EDT)<br />
<br />
=== Don't Throw the Kitchen Sink - ICE ===<br />
<br />
While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute?<br />
--[[User:Rbarnhar|Rbarnhar]] 09:59, 21 June 2012 (EDT)<br />
<br />
== Chapter 10 ==<br />
<br />
=== Influence of level-two units ===<br />
Provide a detailed explanation of how '''deletion diagnostics''' is performed and provide a practical example to illustrate it.--[[User:Gilbert8|Gilbert8]] 18:22, 24 June 2012 (EDT)<br />
<br />
=== Incorporating Descriptive Statistics ===<br />
One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --[[User:Msigal|Msigal]] 10:12, 25 June 2012 (EDT)<br />
<br />
=== Orders of the model checks ===<br />
<br />
In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 2:33, 25 June 2012 (EDT)<br />
<br />
=== Assumption violations ===<br />
In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--[[User:Dusvat|Dusvat]] 17:45, 25 June 2012 (EDT)<br />
<br />
=== Checking for Random Slopes ===<br />
In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?<br />
<br />
=== Cluster Size and Model Assumptions ===<br />
<br />
In the practical nuts and bolts of application, one at times encounters situations where the number of clusters in a requested multilevel model does not support the testable or readily examinable assumption that the random intercepts are normally distributed. How important is it that the assumption of normally distributed intercepts holds?--[[User:Rbarnhar|Rbarnhar]] 01:22, 26 June 2012 (EDT)<br />
<br />
=== Model Residuals ===<br />
<br />
The authors emphasize the importance of using OLS estimation for determining unbiased diagnostics. However, it may be useful to use model implied residuals such as <math>r = y - X \hat{\beta}</math> and <math>r.c = y - X \hat{\beta} - Z \hat{\gamma}</math>. Describe how these model implied residuals can be used to evaluate influential observations at different design levels. --[[User:Rphilip2004|Rphilip2004]] 08:33, 26 June 2012 (EDT)<br />
<br />
== Chapter 11 ==<br />
<br />
=== Orders of the model checks ===<br />
<br />
Usually, we choose the same number n as the sample size of the micro-units and the same number N as the sample size of the macro-units. I want to know whether we can improve the power for testing or small standard errors through choosing the different number as the sample size in different groups? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:11, 27 June 2012 (EDT)<br />
<br />
=== Allocating treatment to groups or individuals ===<br />
<br />
The author mention that multisite trial are difficult to implement as the reseercher has to make sure that they will be no contamination between the two treatment conditions within the same site or cluster randomized control may lead to selection bias. The author propose the '''pseudo-cluster randomized trial.'''Explain how this method is performed. Is this used often in practice? If this technique fails, are there other methods to resolve those situations? --[[User:Gilbert8|Gilbert8]] 15:04, 27 June 2012 (EDT)<br />
<br />
=== Treatment Allocation, Continued ===<br />
<br />
Building upon Gilbert's question, discuss the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials. Do any of these strategies match the Schizophrenia dataset that we have been working with during Lab 2? --[[User:Msigal|Msigal]] 18:51, 27 June 2012 (EDT)<br />
<br />
Continuing Matt's question, about the differences between cluster randomized trials, multisite trials, and pseudo-cluster randomized trials, can you give us an example of when to pick one over the other along with the cost of each trial is broken down by the cost function. Which trial is more costly? --[[User:Dusvat|Dusvat]] 23:17, 27 June 2012 (EDT)<br />
<br />
=== Power Expecting Missingness/Drop Out ===<br />
<br />
I think an interesting addition to the power analyses presented would be if we could work in estimates of drop-out/non-response within cluster. For example, if I estimated that there was only a 80% chance that I would actually get useful/complete data from each level-1 unit (lower if using the URPP ;) How might this be built into these analyses? Are there authors who have done work in this with mixed models? Is it really worth bothering with, given that power analysis is hard enough (e.g. coming up 'guesses' for estimates) as it is?<br />
<br />
<br />
== Chapter 12 ==<br />
== Chapter 13 ==<br />
== Chapter 14 ==<br />
== Chapter 15 ==<br />
== Chapter 16 ==<br />
== Chapter 17 ==<br />
== Chapter 18 ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-26T00:43:43Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8 ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9 ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
2. Discuss an example with missing data set . Provide R code using multiple imputation to complete data set.<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
=== Patterns of Missingness ===<br />
Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --[[User:Msigal|Msigal]] 08:47, 20 June 2012 (EDT)<br />
<br />
If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --[[User:Smithce|Smithce]] 09:58, 21 June 2012 (EDT)<br />
<br />
=== Missingness Assumption ===<br />
<br />
Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:31, 20 June 2012 (EDT) <br />
<br />
=== Full Maximum Likelihood vs. Imputation ===<br />
Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--[[User:Dusvat|Dusvat]] 17:27, 20 June 2012 (EDT)<br />
<br />
=== Dancing on the Bay(esian) ===<br />
<br />
Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequentists to fit our models? --[[User:Rphilip2004|Rphilip2004]] 09:18, 21 June 2012 (EDT)<br />
<br />
=== Don't Throw the Kitchen Sink - ICE ===<br />
<br />
While Snijders and Bosker promote the idea of a complex model for imputation using, as many feasible variables as possible, is this always true? Should we not consider parsimony as we would in any other form of regression? Should it also possibly reflect the amount of missing data we are trying to impute?<br />
--[[User:Rbarnhar|Rbarnhar]] 09:59, 21 June 2012 (EDT)<br />
<br />
== Chapter 10 ==<br />
<br />
=== Influence of level-two units ===<br />
Provide a detailed explanation of how '''deletion diagnostics''' is performed and provide a practical example to illustrate it.--[[User:Gilbert8|Gilbert8]] 18:22, 24 June 2012 (EDT)<br />
<br />
=== Incorporating Descriptive Statistics ===<br />
One aside that Snijders and Bosker make in this chapter is about the inclusion of the standard deviation for each group of a relevant level one variable as a fixed effect in the model. This was mentioned within the section on adding contextual variables (p. 155). This strikes me as an interesting prospect. Does modeling the standard deviation have any interpretative benefits over simply using group size? Are there other descriptive statistics pertaining to groups that would be meaningful to add to a model? What would their interpretation be? --[[User:Msigal|Msigal]] 10:12, 25 June 2012 (EDT)<br />
<br />
=== Orders of the model checks ===<br />
<br />
In Chapter 10, the auther introduced a number of things we need to do when we build a mixed model, such as "include contextual effects", "check random effects", "specification of the fixed part", "specification of the random part" and "check the distributional assumption". When I deal with real data, I always confused that which things I should do first, which things I should do next. No rules or there is a better order to do these things?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 2:33, 25 June 2012 (EDT)<br />
<br />
=== Assumption violations ===<br />
In Ch. 10, the authors talk about having the random slope and random intercept uncorrelated with all explanatory variables. If this assumption is incorrect you can just add relevant explanatory variables to the model. What happens if you have a real world example where there are not quantifiable relevant explanatory variables to be added to the model? How would you go about fixing the incorrect assumption? What happens if more than one assumption is violated and you cannot just include other 'descriptive' variables to the model?--[[User:Dusvat|Dusvat]] 17:45, 25 June 2012 (EDT)<br />
<br />
=== Checking for Random Slopes ===<br />
In this chapter (page 155-156) S&B advocate checking for a random slope for each level-one variable in the fixed part. Because the process can be time consuming, they further suggest using one-step methods to obtain provisional estimates and then checking the t-ratio. We have learned that t-tests of this kind are problematic for variance parameters and not to be trusted. Should we then use LRTs to test all possible random slopes, possibly with simulation?<br />
<br />
== Chapter 11 ==<br />
== Chapter 12 ==<br />
== Chapter 13 ==<br />
== Chapter 14 ==<br />
== Chapter 15 ==<br />
== Chapter 16 ==<br />
== Chapter 17 ==<br />
== Chapter 18 ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-21T13:58:35Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8 ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9 ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
2. Discuss an example with missing data set . Provide R code using multiple imputation to complete data set.<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
=== Patterns of Missingness ===<br />
Snijders and Bosker go to some lengths to explain the difference between MCAR, MAR, and MNAR. However, I felt they somewhat glossed over a definition of monotone missingness. What is monotone missingness? How would one check for it, especially in terms of a multilevel model? --[[User:Msigal|Msigal]] 08:47, 20 June 2012 (EDT)<br />
<br />
If we use a missingness indicator and predict this using a logistic regression model, does this mean that significant predictors should be kept in the imputation model and non-significant predictors can be omitted from the imputation model? --[[User:Smithce|Smithce]] 09:58, 21 June 2012 (EDT)<br />
<br />
=== Missingness Assumption ===<br />
<br />
Rubin defined three types of missingness in 1976. When we use methods to handling incomplete data, what information can help us to make a reasonable assumption? What is the key point whether missingness is MCAR or MAR? <nowiki>--~~~~</nowiki> --[[User:Qiong Li|Qiong Li]] 12:31, 20 June 2012 (EDT) <br />
<br />
=== Full Maximum Likelihood vs. Imputation ===<br />
Could you give us an example using both maximum likelihood and imputation methods in R and then compare? How are the methods similar/ different? Is one method computationally better than the other?--[[User:Dusvat|Dusvat]] 17:27, 20 June 2012 (EDT)<br />
<br />
=== Dancing on the Bay(esian) ===<br />
<br />
Imputation seems to be a very Bayesian practice, and the authors mention the intimate connection to Gibbs sampling when imputing data in the univariate case. I wonder, however, that if we are willing to impute data in this Bayesian manner why we don't just jump ship and move to a more complete Bayesian methodology? What are the benefits/downsides of initially dancing with the idea of being Bayesian to get our complete data, then being frequently to fit our models? --[[User:Rphilip2004|Rphilip2004]] 09:18, 21 June 2012 (EDT)<br />
<br />
== Chapter 10 ==<br />
== Chapter 11 ==<br />
== Chapter 12 ==<br />
== Chapter 13 ==<br />
== Chapter 14 ==<br />
== Chapter 15 ==<br />
== Chapter 16 ==<br />
== Chapter 17 ==<br />
== Chapter 18 ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_ModelsMATH 6643 Summer 2012 Applications of Mixed Models2012-06-19T20:47:37Z<p>Smithce: </p>
<hr />
<div><big>'''Thought of the Week'''</big><br />
<blockquote><br />
“This is not a scientific survey. It's a random survey.” – Representative Daniel Webster voting in May 2012 for the abolition of the American Community Survey, the U.S. analogue of the Canadian ”long form”.<br />
</blockquote><br />
<big>'''News'''</big><br />
* May 17: We decided that, from now on, instead of preparing an exercise question on the current chapter of Snijders and Bosker before each class, we would each prepare an 'discussion question' to be placed in the same file so it can easily be viewed and discussed. The file is [[/Snijders and Bosker: Discussion Questions|Snijders and Bosker: Discussion Questions]]. <br />
<big>'''Quick Links'''</big><br />
* [[/Snijders and Bosker: Discussion Questions|Snijders and Bosker: Discussion Questions]]<br />
** [http://www.stats.ox.ac.uk/~snijders/mlbook.htm Demo Code] (at the bottom of the page)<br />
* [[/Links|Links for the course (add abundantly)]]<br />
* [[Statistics in the News|Statistics in the News (add to this too!)]]<br />
*[[SPIDA_2012:_Mixed_Models_with_R|SPIDA 2012 home page]] and labs:<br />
** [[SPIDA_2012:_Mixed_Models_with_R/Lab 1|Lab 1 Mixed Models]]<br />
** [[SPIDA_2012:_Mixed_Models_with_R/Lab 2|Lab 2 Longitudinal Models]]<br />
** [[SPIDA_2012:_Mixed_Models_with_R/Lab 3|Lab 3 Miscellaneous topics and intro to GLMMs]]<br />
** [[SPIDA_2012:_Mixed_Models_with_R/Lab 4|Lab 4 Missing Data]]<br />
** [[SPIDA_2012:_Mixed_Models_with_R/Lab 5|Lab 5 Introduction to GLMMs with MCMC]]<br />
<br />
<big>'''Course Schedule'''</big><br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Tuesday !! Thursday <br />
|-<br />
| [[#Class_1:_May_8|'''May 8: Class 1'''<br>]]<br />
| [[#Class_2:_May_10|'''10: Class 2'''<br>]]<br />
|-<br />
| [[#Class_3:_May_15|'''15: Class 3'''<br>]]<br />
| [[#Class_4:_May_17|'''17: Class 4'''<br>]]<br />
|-<br />
| '''22'''<br><br />
No class: [http://www.sph.utoronto.ca/lou/dlsph-sora-taba/w-brd/Programs.asp DLSPH/SORA/TABA workshop]<br />
| '''24'''<br><br />
No class: [http://www.isr.yorku.ca/spida2012/courses.html SPIDA]<br />
|-<br />
| '''29'''<br><br />
No class: [http://www.isr.yorku.ca/spida2012/courses.html SPIDA]<br />
| '''31'''<br><br />
No class: [http://www.isr.yorku.ca/spida2012/courses.html SPIDA]<br />
|-<br />
| '''June 5'''<br><br />
No class: [http://www.ssc.ca/en/meetings/2012 SSC 2012 meetings in Guelph]<br />
| [[#Class_5:_June 7|'''7: Class 5''']]<br><br />
|-<br />
| [[#Class_6:_June 12|'''12: Class 6''']]<br><br />
| [[#Class_7:_June 14|'''14: Class 7''']]<br><br />
|-<br />
| [[#Class_8:_June 19|'''19: Class 8''']]<br><br />
| [[#Class_9:_June 21|'''21: Class 9''']]<br><br />
|-<br />
| [[#Class_10:_June 26|'''26: Class 10''']]<br><br />
| [[#Class_11:_June 28|'''28: Class 11''']]<br><br />
|-<br />
| [[#Class_12:_July 3|'''July 3: Class 12''']]<br><br />
| [[#Class_13:_July 5|'''5: Class 13''']]<br><br />
|-<br />
| [[#Class_14:_July 10|'''10: Class 14''']]<br><br />
| [[#Class_15:_July 12|'''12: Class 15''']]<br><br />
|-<br />
<br />
| [[#Class_16:_July 17|'''17: Class 16''']]<br><br />
| [[#Class_17:_July 19|'''19: Class 17''']]<br><br />
|-<br />
| [[#Class_18:_July 24|'''24: Class 18''']]<br><br />
| [[#Class_19:_July 26|'''26: Final exam''']]<br><br />
|-<br />
|}<br />
<br />
== Things we'll do ==<br />
<br />
# Read [[/Snijders and Bosker: Discussion and Exercises|Snijders and Bosker (2012) ''Multilevel Analysis, 2nd. ed.'', Sage]]<br />
#: This is one of the best books I've read on the concepts and on the 'art' of mixed models. Almost every sentence is rich with meaning. The authors very rarely skip an important point. I would venture to say that it is the best book written yet on applications of mixed models. The first edition in 1999 held the title for a number of years.<br />
#* We will discuss a new chapter and the beginning of each class. Someone will be chosen at random to lead the discussion.<br />
#* Since the book doesn't have exercises, you will prepare [[/Snijders and Bosker: Discussion and Exercises|exercises]] for each chapter. Before the chapter is discussed, you will prepare 'simple, straightforward' exercises. After the chapter is discussed you will prepare deeper more conceptual and challenging exercises. You can use various [[sources of multilevel data]] for the exercises.<br />
#* Conveniently, the book has 18 chapters and we have 18 classes. We will begin with chapter 2 in class 2 and continue. You should prepare a simple exercises and post it on your 'student' page before the class. Be prepared to discuss the chapter in class. Each day someone will be chosen at random to lead off the discussion with a five-minute summary of the key points in the chapter. Post a deeper question on the same chapter before the next class.<br />
# Write a paper. Maybe publish it.<br />
#* This will be done in teams of two or, exceptionally, three. If you wish to be 'randomly' matched to others who wish to be randomly matched let me know. The idea is stolen from [http://www.math.yorku.ca/people/georges/Files/MATH6643/King_2006_Publications_Publications.pdf Gary King (2006)]. Read Gary King's article to find out how to proceed. You need to find an article with data that lends itself to longitudinal or multilevel analysis. You will find many examples of King's students replication papers including their data and Rcode of the [http://projects.iq.harvard.edu/gov2001/data Harvard Dataverse]. Choose the article before May 22 and send me an email message. During the 'break' created by SPIDA and the SSC meeting, you should work on replicating the original analysis.<br />
# Complete a few assignments early in the course intended to cover the 'math' of multilevel models.<br />
# Write a function or a small package in R that does something useful and that can be added to the 'spidadev' package. Some [[/R projects|ideas]]. Due at class 17.<br />
# Write a final exam.<br />
<br />
==Some Data Sources with Publication Listings==<br />
#The Longitudinal Study of American Youth ([http://lsay.org/]).<br />
#The University of Wisconsin's BADGIR (Better Access to Data for Global Interdisciplinary Research, email registration is easy to gain access. If you decide to use the National Survey of Families and Households data I have a reduced longitudinal dataset created just ask me (Ryan) for it, it might be useful[http://nesstar.ssc.wisc.edu/index.html]).<br />
#The Wisconsin Longitudinal Study (easy email sign up [http://www.ssc.wisc.edu/wlsresearch/]).<br />
<br />
== Grades ==<br />
It would be so much more fun just to learn for its own sake! But...<br />
<br />
* [30%] Snijders and Bosker has 18 chapters. We will discuss and do exercises for 17 of them. That's a total of 17 discussions and 34 exercises. Each discussion and exercise is graded out of 10. I will discard the 11 lowest grades of the 51. That gives you a maximum possible score of 400.<br />
* [30%] Replication, reanalysis and paper.<br />
* [10%] Assignments<br />
* [15%] R function or package.<br />
* [20%] Final exam<br />
<br />
== Who and where I am ==<br />
* Georges Monette, Ph.D., P.Stat. (http://www.ssc.ca/accreditation/index_e.html)<br />
** TEL 5075 until June 30,2012, N626 Ross afterwards.<br />
**mailto:georges+mixed@yorku.ca (Note adding '+mixed' increases the priority of your message in my mailbox)<br />
**http://www.math.yorku.ca/~georges <br />
**Office hours: After class or by appointment<br />
<br />
== Who you are ==<br />
[[/Students|Your pages]]<br />
<br />
== Things to learn ==<br />
Applying a statistical methodology requires being aware of many things:<br />
* Some basic theory behind the method <br />
* What assumptions the theory implies, how they fail and when they're reasonable although wrong<br />
* How to turn substantive research questions into statistical questions<br />
* How to formulate and explore those questions statistically<br />
* How to avoid too common but execrable errors: e.g. interpreting main effects with interactions, interpreting lists of p-values from regression output, interpreting multiple p-values for dummy variables of a factor<br />
* How to program<br />
* How to produce informative and stimulating graphics<br />
* Numerical savvy: avoiding X'X, using svd and qr instead<br />
We will try to weave these themes throughout the course<br />
== Places to look ==<br />
*[[/Links|Links for the course]] (please add abundantly)<br />
** [[/Links#Annotated bibliography|Annotated Bibliography]]<br />
** [[/Links#Quotes|Quotes]]<br />
<br />
== Class 1: May 8 ==<br />
=== Links ===<br />
*Hans Rosling Ted Talks: [http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html] [http://www.ted.com/talks/hans_rosling_reveals_new_insights_on_poverty.html] [http://www.ted.com/talks/hans_rosling_on_global_population_growth.html]<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1] (Password protected: User: frisch Password: waugh (this information will disappear soon ... well not really -- this is a wiki))<br />
* [[/R scripts|R scripts]]<br />
* [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class01/Linear%20Algebra-rot.pdf Notes on Linear Algebra]<br />
* [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class01/Statistical_Issues_I.pdf Statistical Issues]<br />
<br />
=== Assignment 1 ===<br />
<br />
Due: May 15 before class<br />
<br />
You must work individually but you can use any written source provided you cite it.<br />
<br />
: 1. [10] Prove the Sherman-Morrison identity (state appropriate assumptions). Note that U and V need not be square matrices.<br />
::<math>(A + UDV)^{-1} = A^{-1} - A^{-1}U(D^{-1} + V A^{-1} U)^{-1} VA^{-1}</math><br />
::: Hint: It might be easier to first prove a special case for <math>(I + UV)^{-1}</math> and then use basic facts about inverses and products, e.g. <math>(AB)^{-1} = B^{-1}A^{-1}</math><br />
<br />
: 2. [10] Consider a linear regression model <math> Y = X \beta + \epsilon</math> where <math>X</math> is a <math>n \times (k+1) </math> matrix whose first column consists of 1's, <math>\beta = ( \beta_0, \beta_1, ... , \beta_k )'</math> and <math>Var(\epsilon)= \sigma^2 I</math>. Let <math>\Sigma_{XX}</math> be the <math>k \times k </math>variance matrix of the predictor variables and let <math>s_E</math> be the residual standard error. Find and prove an expression for <math>Var( (\hat{\beta_1}, ... , \hat{\beta_k} )')</math><br />
<br />
: 3. [10] Let <math> \Sigma </math> be symmetric. Show that <math> \Sigma </math> is positive-definite if and only there exists a non-singular matrix <math>A</math> such that <math>\Sigma = AA' </math>.<br />
<br />
: 4. [10] Show that a symmetic matrix <math> \Sigma </math> is a variance matrix if and only if there exists a matrix <math>A</math> such that <math>\Sigma = AA' </math>.<br />
<br />
: 5. [10] Let <math>A</math> and <math>B</math> be square matrices such that <math> AA' = BB' = \Sigma </math> with <math>\Sigma</math> positive definite. Show that there exists an orthogonal matrix <math>\Gamma</math> such that <math> A = B \Gamma </math>.<br />
<br />
: 6. [50] Retrieve the "Arrests" data set from "library(effects)" in R. You can get information about the variables in the data set in the usual way with<br />
<blockquote><br />
> ?Arrests<br />
: Note that you might have to download the library first with:<br />
> install.packages('effects')<br />
: The ''Toronto Star'' has published some stories claiming that this data set reveals a pattern of discrimination in police behaviour. You have been hired by the ''National Post'' to study the data set and produce an independent opinion. Your opinion may agree, disagree or otherwise qualify the claim that this data shows a conclusive pattern of discrimination. Write a report with suitable graphs and include the details of your analysis with appropriate graphs as an appendix.<br />
</blockquote><br />
<br />
== Class 2: May 10 ==<br />
* Slides:<br />
** [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class02/Why%20Mixed%20and%20Longitudinal%20Models.PDF Why Mixed and Longitudinal Models] [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class02/Why%20Mixed%20and%20Longitudinal%20Models-n02.PDF (with annotations in class)]<br />
** [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class02/Hierarchical_Models_v2011_02.pdf From Hierarchical to Mixed Models] [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class02/Hierarchical_Models_v2011_02-n02.pdf (with annotations in class)]<br />
* Notes:<br />
** [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class02/Class02-rot.pdf Notes on]<br />
*** Linear transformations of random vectors: mean and variance <br />
*** Multivariate Normal Distribution<br />
**** Marginal and conditional distributions<br />
**** Visualizing the bivariate normal <br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_02.html Screen capture of Class 2] (sorry -- I forgot to turn it back on after the break)<br />
<br />
== Class 3: May 15 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
=== Assignment 2 ===<br />
Due June 7 before class<br />
<br />
You must write up your own work based on your own understanding but you can do anything you want to develop your understanding.<br />
<br />
: 1. [10] A basis for <math>\mathbb{R}^p</math> that is a conjugate basis with respect to a positive definite matrix <math>M</math> is a sequence of vectors <math>x_1, x_2, ... , x_p</math> in <math>\mathbb{R}^p</math> such that <math>x'_i M x_i = 1</math> and <math>x'_i M x_j = 0</math> if <math>i \neq j</math>. Show that the columns of a non-singular matrix <math>A</math> form a conjugate basis with respect to <math>\Sigma^{-1}</math> if <math>\Sigma = AA'</math>. Note that a conjugate basis is merely an orthogonal basis with respect to the metric defined by <math>||x||^2 = x' \Sigma^{-1}x</math>. <br />
<br />
: 2. [10] We will call a "square root" of a square matrix <math>M</math> any square matrix <math>A</math> such that <math>M = AA'</math>. Show that a square matrix has a square root if and only if it is a variance matrix. <br />
<br />
: 3. [10] Write a function in R that computes a square root of a variance matrix M. Use the 'eigen' function. [Bonus: 2] Get your function to give an informative error message if M does not have a square root for some reason.<br />
<br />
: 4. [10] Using the function in 3, write a multivariate normal random number generator. Write it to parallel the univariate 'rnorm'. The univariate 'rnorm' takes three arguments: n, mean and sd. Consider writing your 'rmvnorm' so the third argument, if given, must be named either 'var' or 'sd' (depending on whether the user is giving a variance or the square root of a variance as input) to avoid confusion with the univariate generator. The default could be the identity -- which doesn't need to be distinguished as 'var' or as 'sd'.<br />
<br />
: 5. [10] Write a simple 'lmfit' function that calculates least squares regression coefficients using an algorithm based on the svd. Ideally, design the function so it takes a formula and a data frame as arguments, e.g. lmfit( y ~ x1 + x2, dd). You can generate the model matrix using the 'model.matrix' function and extract the response using the first column of the model.frame command. <br />
<br />
: 6. [10] Consider a <math>2 \times 2</math> variance matrix <math>\Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22}\end{bmatrix}</math> for a random vector <math>\begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix}</math>. Verify that the Cholesky matrix <math>C = \begin{bmatrix} \sigma_{11}^{1/2} & 0 \\ \sigma_{21}/ \sigma_{11}^{1/2}& \sqrt{\sigma_{22} - \sigma_{12}^2 / \sigma_{11}}\end{bmatrix}</math> is a square root of <math>\Sigma</math>.<br />
:: Show that the Cholesky matrix can be written as <math>\begin{bmatrix} \sigma_1 & 0 \\ \beta_{21} \sigma_1 & \sigma_{2 \cdot 1}\end{bmatrix}</math> where <math>\beta_{21}</math> is the regression coefficient of <math>Y_2</math> on <math>Y_1</math>.<br />
:: Draw a concentration (or data) ellipse and indicate the interpretation of the vectors defined by the columns of <math>C</math> relative to the ellipse.<br />
<br />
: 7. [10] Show that a non-singular <math>2 \times 2</math> variance matrix, <math>\Sigma</math> can be factored so that <math>\Sigma = AA'</math> with <math>A</math> an upper triangular matrix [in contrast with problem 6 where the matrix is lower triangular]. Explain the interpretation of the elements of this matrix as in question 6. <br />
<br />
: 8. [20] Generate 100 observations for three variables <math>Y</math>, <math>X</math> and <math>Z</math> so that in the regression of <math>Y</math> on both <math>X</math> and <math>Z</math> neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 is rejected at the 1% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for <math>X</math> and <math>Z</math> and appropriate confidence ellipses for their two regression coefficients. What does this example illustrate about the appropriatenes of scanning regression output for significant p-values and concluding that nothing is happening if none of the p-value achieve significance?<br />
<br />
: 9. [20] Generate 100 observations for three variables <math>Y</math>, <math>X</math> and <math>Z</math> so that in the separate simple regressions of <math>Y</math> on each of <math>X</math> and <math>Z</math> neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 in a multiple regression of <math>Y</math> on both <math>X</math> and <math>Z</math> is rejected at the 5% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for <math>X</math> and <math>Z</math> and appropriate confidence ellipses for their two regression coefficients. Explain the relationship between the ellipses and the phenomenon exhibited in this problem. What does this example illustrate about the appropriatenes of forward stepwise regression to identify a suitable model to predict <math>Y</math> using both <math>X</math> and <math>Z</math>?<br />
<br />
== Class 4: May 17 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
=== Class Notes ===<br />
* [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class04/Notes_2012_05_17-rot2.pdf Class notes]:<br />
** Learning statistics: The How? vs The When?<br />
** What does adding <math>\bar{X}_j</math> do to your model and why. [Ask me why the added-variable plot explains everything -- almost].<br />
** Compositional effect (aka between effect) = Contextual effect + Within effect<br />
** The 'empty' random intercept model<br />
*** Generating data<br />
*** Estimating <math>\beta_{0j}</math> with a BLUE<br />
*** Three ways of estimating <math>\gamma_{00}</math> <br />
**** Overall mean<br />
**** Mean of means<br />
**** Inverse variance weighted mean of means (= mixed model approach)<br />
*** Back to estimating <math>\beta_{0j}</math> -- with a BLUP (= empirical Bayes estimator = shrinkage estimator)<br />
*** Exchangeability: when to shrink and when not to shrink<br />
==== To read ====<br />
* Visualizing regression: This material is relevant for questions 7 and 8 of assignment 2. I propose to look at it quickly and to let you explore it at your leisure.<br />
** [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class04/Visualizing_Regression-I-Simple.pdf "Simple regession"]<br />
** [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class04/Visualizing_Regression-II-Multiple.pdf Multiple regression]<br />
**: [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class04/Visualizing_Regression-II-Multiple-mod20110928.R R script]<br />
* We will continue with [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class02/Hierarchical_Models_v2011_02-n02.pdf From Hierarchical to Mixed Models]<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_2012_05_17.html Screen capture of Class 4]<br />
<br />
== Class 5: June 7 ==<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_05.html Screen capture of Class 5]<br />
<br />
== Class 6: June 12 ==<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_06.html Screen capture of Class 6] (sorry I messed up with the audio again and your comments are barely audible -- I quite sure I will do better on Thursday)<br />
* [[/Testing fixed and random effects in R]]<br />
<br />
== Class 7: June 14 ==<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_07.html Screen capture of Class 7]<br />
* Two labs: [[SPIDA_2012:_Mixed_Models_with_R/Lab_1|Lab 1]] and [[SPIDA_2012:_Mixed_Models_with_R/Lab_2|Lab 2]], with annotated examples using R to fit mixed models.<br />
* [http://www.math.yorku.ca/people/georges/Files/MATH6643/Class02/Hierarchical_Models_v2011_02-n02.pdf Hierarchical Models to Mixed Models] discusses theory and application of mixed models.<br />
=== Assignment 3 ===<br />
Due: June 21<br />
The purpose of this assignment is to give you a chance to try out what we're learning by working on a dataset and to have the sobering experience of split-half validation.<br />
Using the 'hsfull' data set in 'spida', choose a random sample of half the schools. To make everything replicable, first choose a random seed at random (an odd integer between 1 and 2^32-1), record the number and use 'set.seed' with the number just before selecting the random sample. <br />
Using the half sample, explore the role ses, Sex, Minority and Sector in their relationship with math achievement. Discuss your strategy as you go along. At some point(s) in the analysis, illustrate that you can<br />
# Correctly test the significance of interactions and of effects with multiple degrees of freedom.<br />
# Avoid incorrect interpretations of p-values in regression output (you should do this at '''every''' point)<br />
# Test hypotheses involving the random effects structure of the model<br />
# Choose between various random effects structures<br />
# Carry out a test involving setting up a linear hypothesis matrix with more than one row<br />
# Estimate and plot conditional effects of a factor that interacts with ses. You should first plot an 'effect' plot showing the estimated response for each level as a function of ses and, next, graph the estimate gap with standard error bands.<br />
# Run one of your final models on the other half of the data. Comment on differences. Create the 'gap plot' with the other half of the data and comment on the differences.<br />
<br />
=== Treasure Hunt ===<br />
Bonus: Find examples of good and bad reports of mixed model and add links with comments here: [http://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Links#Good.2C_bad_and_ugly_examples_of_the_use_of_mixed_models_in_research]<br />
=== Effect sizes in mixed models ===<br />
* [https://stat.ethz.ch/pipermail/r-help/2006-June/107999.html An interesting message on an R mailing list]: "... in psychology, reviewers will bludgeon you for an effect size..." but with some cause:<br />
*: Quoting from [http://www.hlm-online.com/papers/HLM_effect_size.pdf Roberts (2006)], the APA Publication Manual (2001):<br />
*: <blockquote> The general principle to be followed, however, is to provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship. (p. 26)</blockquote><br />
* See also the comments on effect sizes in [http://www.apa.org/pubs/journals/releases/amp-54-8-594.pdf Wilkinson and the Task Force on Statistical Inference (1999)]. In part:<br />
*:<blockquote> '''Effect sizes.''' Always present effect sizes for primary outcomes. If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d). It helps to add brief comments that place these effect sizes in a practical and theoretical context.</blockquote><br />
<br />
== Class 8: June 19 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Class 9: June 21 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Class 10: June 26 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Class 11: June 28 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Class 12: July 3 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
-->== Class 13: July 5 ==<br />
== Class 14: July 10 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Class 15: July 12 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Class 16: July 17 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Class 17: July 19 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Class 18: July 24 ==<br />
<!--<br />
* [http://blackwell.math.yorku.ca/Files/MATH6643/MATH6643_01.html Screen capture of Class 1]<br />
--><br />
== Exam: July 26 ==<br />
== Interesting Links ==<br />
* Kaggle.com: [http://www.kaggle.com/ Data Analysis as a sport] (thanks to Ryan for sending this)</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-19T20:14:14Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8 ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Georges|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r | S&B ExCode] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Generalized Linear Models and Heteroscedasticiy ===<br />
<br />
Given a linear model that has level - 1 heteroscedasticity related to multiple level - 1 predictors, does this not mean that the heteroscedasticity can be thought of as related to the overall mean response. The residuals of a heteroscedastic model would then become functions of the mean response. Generalized linear models often model the variance as a function of the mean response (ex., Poisson, Gamma, Negative Binomial, etc.). When might it be appropriate to abandon a direct linear relationship in favour of a generalized linear model (which at times retains the additive linear properties desired (in the linear predictor)) to deal with heteroscedastic issues? Is this even possible? --[[User:Rbarnhar|Rbarnhar]] 14:24, 19 June 2012 (EDT)<br />
<br />
== Chapter 9 ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
2. Discuss an example with missing data set . Provide R code using multiple imputation to complete data set.<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
== Chapter 10 ==<br />
== Chapter 11 ==<br />
== Chapter 12 ==<br />
== Chapter 13 ==<br />
== Chapter 14 ==<br />
== Chapter 15 ==<br />
== Chapter 16 ==<br />
== Chapter 17 ==<br />
== Chapter 18 ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-19T18:15:25Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP? Answer: Yes I do :)<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
== Chapter 8 ==<br />
You can sign your contributions with <nowiki>--~~~~</nowiki> --[[User:Georges|Georges]] 07:50, 14 June 2012 (EDT)<br />
=== "Correlates of diversity" ===<br />
<br />
Provide an example illustrating how level-two variables are considered being associated with level-one heteroscedasticity.<br />
--[[User:Gilbert8|Gilbert8]] 11:26, 16 June 2012 (EDT)<br />
<br />
=== "Modeling Heteroscedasticity" ===<br />
<br />
When Snijders and Bosker say they are "modeling heteroscedasticity", is this simply incorporating more random slopes into the model? For instance, on page 127, they added a fixed effect for SA-SES (the school average of SES) and a random slope for it. What kind of plots would let us see if these inclusions are necessary? --[[User:Msigal|Msigal]] 11:26, 18 June 2012 (EDT)<br />
<br />
=== Linear or quadratic variance functions ===<br />
<br />
The level-one residual variance can be expressed by a linear or quadratic function of some variables. How to decide the function form?<br />
Can we say that if the variables have a random effect, then we use a quadratic form. Otherwise, we use a linear form? Is it the same thing for the intercept residual variance?<br />
<nowiki>--~~~~</nowiki> --[[User:Georges|Qiong Li]] 12:25, 18 June 2012 (EDT)<br />
<br />
=== On a practical note - How is this done?? ===<br />
It appears that in order to fit a model with a linear/quadratic function for the variance the authors had to use MLwiN. Are there other ways to accomplish this? Could we talk a little about what their demo code is accomplishing? [http://www.stats.ox.ac.uk/~snijders/ch8.r] --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
=== Variable centering ===<br />
<br />
Since fixed effects variables can be included in the R matrix to model systematic heteroscedasticity discuss the effects of centering variables in this context. Does centering affect the estimation results or numerical stability? --[[User:Rphilip2004|Rphilip2004]] 13:42, 19 June 2012 (EDT)<br />
<br />
What happens to the variance function when there are more than two levels? Do we still only have to choose from linear and quadratic forms or dose it become more complicated? --[[User:Dusvat|Dusvat]] 18:43, 18 June 2012 (EDT)<br />
<br />
=== Generic regression question: Treating a factor as continuous for the interaction ===<br />
On page 126 Models 3 (described on page 124) the authors treat SES as a factor for main effects, but then to keep the number of interactions around they treat it as numeric in the interaction with other variables. This seems like it could come in useful, are there any caveats we should be aware of in using this technique? --[[User:Smithce|Smithce]] 14:15, 19 June 2012 (EDT)<br />
<br />
== Chapter 9 ==<br />
<br />
=== "Imputation" ===<br />
<br />
1.The chapter discuss imputation as a way of filling out missing data to form a complete data set. Are there any other method which can be used to achieve the same goal? Provide few examples.<br />
<br />
2. Discuss an example with missing data set . Provide R code using multiple imputation to complete data set.<br />
--[[User:Gilbert8|Gilbert8]] 11:38, 18 June 2012 (EDT)<br />
<br />
== Chapter 10 ==<br />
== Chapter 11 ==<br />
== Chapter 12 ==<br />
== Chapter 13 ==<br />
== Chapter 14 ==<br />
== Chapter 15 ==<br />
== Chapter 16 ==<br />
== Chapter 17 ==<br />
== Chapter 18 ==</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-14T05:08:23Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Alex === <br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== Alex === <br />
=== What is random? ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ". In R: would the random side look like: random = ~ (IQ - IQbar)|school, even though (IQ - IQbar) isn't in the fixed part of the second model? How is this different in interpretation from: random = ~ (group centered IQ - IQbar)|school)?<br />
<br />
* NOTE: This question is not necessarily about how to specify a model using nlme, but rather about the terms included in the random part of the model. As a test, I ran two models:<br />
<br />
IQ_dev <- mlbook_red$IQ_verb - mlbook_red$sch_iqv<br />
<br />
mlb612a <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_verb|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612a)<br />
<br />
mlb612b <- nlme:::lme(langPOST ~ IQ_dev + sch_iqv + ses, random =~ IQ_dev|schoolnr, data = mlbook_red, method="ML")<br />
summary(mlb612b)<br />
<br />
Note that the basic IQ_verb variable has been grand mean centered. According to the chapter, it sounds like using IQ_verb in the random portion is OK since it is a linear combination of IQ_dev and sch_iqv (the school means of IQ_verb). However, if I compare it against a second model using IQ_dev in the random part, pretty much all of the coefficients change. Is this expected?<br />
<br />
--[[User:Msigal|Msigal]] 12:07, 13 June 2012 (EDT)<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil ===<br />
<br />
Multi-parameter tests are possible for fixed effects, but can they also be applied to predicted random effects? If so what would be the analog to <math>\hat{\gamma}^' \hat{\Sigma}^{-1}_\gamma \hat{\gamma}</math>, used to find the Wald statistic, and how do we find an appropriate df denominator term for an F test?<br />
<br />
: Question: Just to be sure, I presume you mean a BLUP?<br />
<br />
=== Ryan ===<br />
In Chapter 6 Snijders and Bosker suggest that using deviance tests to evaluate fixed effects in a multilevel model is inappropriate if the estimation method is REML. What is the characteristic of REML vs ML that makes this type of model evaluation incorrect?<br />
:Comment: Likelihood has the form L(data|theta) and is used for inference on theta, for example, by comparing the altitude of the log-likelihood at theta.hat (the summit) compared with the altitude at a null hypothetical value theta.0. This is the basis of deviance tests: Deviance = 2*(logLik(data|theta.hat) - logLik(data|theta.0)).<br />
<br />
: With ML the log-likelihood is logLik( y | beta, G, R) and we can use the likelihood for inference on all parameters. With REML, the data is not the original y, but the residual of y on X, say, e. And the likelihood is only a function of G and R: logLik( e | G, R). 'beta' does not appear in the likelihood, thus the likelihood cannot be used to answer any questions about 'beta' since is does not appear in the likelihood.<br />
<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Alex === <br />
=== Explained Variance in Random Slope Models ===<br />
Looking at the proportion of variance explained by a model in a traditional ANOVA/multiple regression framework is something clients are often extremely interested in. In Chapter 7, Snijders and Bosker discuss how we might approach the issue in MLM. Near the end of the chapter, the authors insinuate that getting an estimate of the amount of explained variance in a random effects (intercept and slope) model is a somewhat tedious endeavour.<br />
<br />
The claim is that random slopes don't change prediction very much so if we re-estimate the model using only random intercepts (no random slopes), this will "normally yield [predicted] values that are very close to values for the random slope models" (p. 114). This statement doesn't quite ring true for me, as in our examples the differences in slope between schools has been fairly striking/substantial. <br />
<br />
Is the authors' statement justifiable? Is obtaining an <math>R^2</math> as important/interesting in MLM as it is in other models?<br />
<br />
=== Explained variance in three-level models === <br />
In example 7.1, we know that how to calculate the explained variance of a level one variable when this variable has a fixed effect only. I want to know how to calculate the explained variance of a level one variable when this variable has a fixed and random effect in the model? <br />
<br />
=== Interpreting <math>R^2</math> as an Effect Size ===<br />
A client fits a multilevel model and comes up with several significant predictors. The client is pleased with themselves, but remembers learning that significance alone isn't good enough these days, and needs help producing a measure of effect size. You compute the Level 1 R^2 and come up with a very small value, say 0.01. Is the model then worthless, even if the magnitude of the predicted change in the outcome is substantively meaningful?<br />
<br />
=== Explained variance === <br />
In the example provided on page 110, it show that the residual variance at level two increases as within-group deviation is added as an explanatory variable to the model in balanced as well as in the unbalanced case. Is this always the case or it is only for this particular example?<br />
<br />
=== Estimates of <math>R^2</math>=== <br />
On page 113, "it is observed that an estimated value for <math>R^2</math> becomes smaller by the addition of a predictor variable, or larger by the deletion of a predictor variable, there are two possibilities: either this is a chance fluctuation, or the larger model is misspecified." The authors then say that whether the first or second possibility is more likely depends on the size of the change in <math>R^2</math>. Can you give an example of when this occurs based on the size of change in <math>R^2</math>?<br />
<br />
=== Predicted <math>R^2</math> ===<br />
After predicting values for random intercepts and slopes using Bayesian methods it is possible to form composite values, <math>\hat{Y}_{ij}</math>, to predict the observed dependent values, <math>Y_{ij}</math>. Obviously these estimates will be sub-optimal as they will suffer from 'shrinkage' effects, but they may be useful for computing a '<math>R^2_{Predicted}</math>'. Discuss situations where knowledge of the predicted slopes and intercepts could be important, and whether an <math>R^2_{Predicted}</math> could be a useful description.<br />
<br />
=== The Size and Direction of <math>R^2</math> Change As a Diagnostic Criteria ===<br />
The suggestion has been made that changes in <math>R^2</math> where the addition or deletion of a variable creates an unexpected and opposing directional change, can serve as a diagnostic toward determining where the flaw in the model resides. However, the authors do not actually indicate which scenarios the size and increase/decrease information obtained from the <math>R^2</math> estimate determines the source of the flaw. 'Wrong' directions provide evidence of model misspecification, but what then of the magnitude component mentioned just prior? (p. 113)<br />
<br />
=== and others ===</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-12T02:00:47Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Alex === <br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
: This raises the interesting question of how the variance of the random intercept and the covariance of the random intercept with the random slope are changed under a recentering of IQ. Let <br />
:: <math>Var\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math> for raw IQ. <br />
: If we recenter IQ with: <math>\tilde{\text{IQ}}=\text{IQ}-c</math> then:<br />
:: <math>\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
{{u}_{0j}} \\<br />
{{u}_{1j}} \\<br />
\end{matrix} \right]</math><br />
: and<br />
:: <math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
1 & c \\<br />
0 & 1 \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
\tau _{0}^{2} & {{\tau }_{01}} \\<br />
{{\tau }_{10}} & \tau _{1}^{2} \\<br />
\end{matrix} \right]\left[ \begin{matrix}<br />
1 & 0 \\<br />
c & 1 \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}+2c{{\tau }_{01}}+{{c}^{2}}\tau _{1}^{2} & {{\tau }_{01}}+c\tau _{1}^{2} \\<br />
{{\tau }_{10}}+c\tau _{1}^{2} & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
: If <math>c=-{{\tau }_{01}}/\tau _{1}^{2}</math>, then the variance of the intercept is minimized:<br />
::<math>Var\left[ \begin{matrix}<br />
{{{\tilde{u}}}_{0j}} \\<br />
{{{\tilde{u}}}_{1j}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}-\tau _{01}^{2}/\tau _{1}^{2} & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
\tau _{0}^{2}\left( 1-\rho _{01}^{2} \right) & 0 \\<br />
0 & \tau _{1}^{2} \\<br />
\end{matrix} \right]</math><br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
: In 'lme', multilevel nesting is handled with. e.g.<br />
fit <- lme( Y ~ X * W, dd, random = ~ 1 | idtop/idmiddle/idsmall)<br />
: Contextual variables present an ambiguity. Assuming that the id variables 'idsmall' and 'idmiddle' are coded uniquely overall, then the the higher level, say 'idmiddle', contextual variables could be coded as either:<br />
cvar(X,idmiddle)<br />
: or<br />
capply( dd , ~ id, with, mean( c(tapply( X, idsmall, mean))))<br />
:Here is a table prepared for SPIDA showing how to handle multilevel nesting and crossed structures in a selection of R functions:<br />
<blockquote><br />
<br />
::{| border="1" cellpadding="4"<br />
|-<br />
! Function !! Notes <br />
|-<br />
| lme<br><br />
in package nlme<br />
| Linear mixed effects: normal response<br><br />
G side and R side modelling<br><br />
Model syntax:<br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
or, to have different models at different levels:<br><br />
Y ~ X * W, random = list(higher = ~ 1, lower = ~ 1 + X )<br><br />
<br />
|-<br />
| lmer <br><br />
in package lme4<br />
| Linear mixed models for gaussian response with Laplace approximation <br><br />
G side modeling only, R = <math>\sigma^2 I</math><br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
|- <br />
| glmer <br><br />
in package lme4<br />
| Generalized linear mixed models with adaptive Gaussian quadrature <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side only, no R side<br><br />
Model syntax: <br><br />
Y ~ X * W +(1+X|id)<br><br />
For nested effect:<br><br />
Y ~ X * W +(1+X|higher) + (1+X|higher:lower)<br> <br />
For crossed effect:<br><br />
Y ~ X * W +(1+X|id1) + (1+X|id2)<br> <br />
<br />
|- <br />
| glmmPQL<br><br />
in packages MASS/nlme<br />
| Generalized linear mixed models with Penalized Quasi Likelihood <br><br />
* family: binomial, Gamma, inverse.gaussian, poisson, gaussian<br><br />
G side and R side as in lme<br><br />
Model syntax: <br><br />
Y ~ X * W, random = ~ 1 + X | id<br><br />
For nested effect:<br><br />
Y ~ X * W, random = ~ 1 + X | higher/lower<br><br />
|-<br />
| MCMCglmm<br><br />
in package MCMCglmm<br />
| Generalized linear mixed models with MCMC <br><br />
* family: poisson, categorical, multinomial, ordinal, exponential, geometric, cengaussian, cenpoisson,<br />
cenexponential, zipoisson, zapoisson, ztpoisson, hupoisson, zibinomial (cen=censored, zi=zero-inflated, za=zero-altered, hu=hurdle<br><br />
G side and R side, R side different from 'lme': no autocorrelation but can be used for multivariate response <br><br />
Note: 'poisson' potentially overdispersed by default (good), 'binomial' variance for binary variables is unidentified.<br> <br />
Model syntax: <br><br />
Y ~ X * W, random = ~ us(1 + X):id [Note: id should be a factor, us=unstructured]<br />
For nested effect:<br><br />
Y ~ X * W, random = ~us(1 + X):higher + us(1 + X):higher:lower<br> <br />
For crossed effect:<br><br />
Y ~ X * W, random = ~us(1 + X):id1+ us(1 + X):id2<br> <br />
|-<br />
|}<br />
<br />
</blockquote><br />
<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
When using random slopes it is generally the case that the level 1 model contains a fixed effect for what will also be the level 2 random effect. The random effect is then an estimate of the group/cluster/individual departure from the fixed effect. However, non-significant level 1 variables are not determinable as different from zero. Are there cases where a non-significant fixed effect can be excluded from the model while retaining the random effect at level 2. What would be the consequence of this and what might it reveal about the level 1 variable? Would this help control for the '''error of excluding a non-significant but confounding variable'''?<br />
:: This is a very interesting question. It would be interesting to create a simulated data set illustrating the issue so we could consider the consequences of having random effects for a confounding factors whose within cluster effect changes sign from cluster to cluster. Can we think of a confounding factor that would do that?<br />
<br />
=== and others === <br />
== Chapter 6 ==<br />
=== Alex === <br />
=== Matthew ===<br />
At the beginning of the chapter, S&J present two models (Table 6.1). They note that "The variable with the random slope is in both models the grand-mean centered variable IQ" (in R: the random side would look like: random = ~ 1 + (IQ - IQbar) | school), even though (IQ - IQbar) isn't in the fixed part of the second model. Does this have any ramifications for interpretation? Would the interpretation of any of the coefficients change if IQ had been used on the random side instead of the grand mean deviation?<br />
<br />
=== Qiong === <br />
The author introduce the t - test to test fixed parameters. We can use summary(model) to get the p - value directly in R. In practice, we use wald test to test fixed parameter. On page 95, they mentioned that "for 30 or more groups, the Wald test of fixed effects using REML standard errors have reliable type I error rates." Is it the only reason we use the wald test in practice?<br />
=== Carrie === <br />
On page 105 in their discussion of modeling within-group variability the authors warn to "keep in mind that including a random slope normally implies inclusion of the fixed effect!". What is an example of a model where one might include random slope without a fixed effect??<br />
=== Gilbert === <br />
On page 104 , the author discuss different approaches for model selection including working upward from level one, joint consideration of both level one and two. Are there other methods to be used If both methods are not providing a satisfactory model? <br />
=== Daniela === <br />
On page 97 and 99, the authors showed us how the tests for random intercept and random slope independently. What test would you use if you wanted to test for both random slope and intercept in the same model? and what would you test it against? ... the linear model or a model with just a random slope or a random intercept?<br />
=== Phil === <br />
=== Ryan ===<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Alex === <br />
=== Matthew === <br />
=== Qiong === <br />
=== Carrie === <br />
=== Gilbert === <br />
=== Daniela === <br />
=== Phil === <br />
=== Ryan ===<br />
=== and others ===</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-07T13:29:18Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Alex === <br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Philip === <br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ was measured on the traditional scale with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
=== and others === <br />
== Chapter 6 ==<br />
=== Alex === <br />
=== Matthew === <br />
=== Philip === <br />
=== Qiong === <br />
=== Carrie === <br />
=== Gilbert === <br />
=== Daniela === <br />
=== Phil === <br />
=== Ryan ===<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Alex === <br />
=== Matthew === <br />
=== Philip === <br />
=== Qiong === <br />
=== Carrie === <br />
=== Gilbert === <br />
=== Daniela === <br />
=== Phil === <br />
=== Ryan ===<br />
=== and others ===</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Snijders_and_Bosker:_Discussion_QuestionsMATH 6643 Summer 2012 Applications of Mixed Models/Snijders and Bosker: Discussion Questions2012-06-07T13:28:34Z<p>Smithce: </p>
<hr />
<div>== Chapter 5 ==<br />
=== Alex === <br />
=== Matthew ===<br />
At the bottom of page 83, Snijders and Bosker outline the process for probing interactions between two level one variables, and how there can be four possibilities for how to model it. If a researcher was to include all four, discuss how each would be interpreted. What might a good selection strategy be if our model had substantially more than two variables?<br />
<br />
=== Philip === <br />
=== Qiong === <br />
If we do not have any information about the data set, how to choose a level - two variable to predict the group dependent regression coefficients? After we choose the level - two variable z, how to explain the cross - level interaction term. <br />
<br />
=== Carrie === <br />
A client arrives with a random slope and intercept model using IQ as a predictor. IQ is employed on the traditional scale, with a mean of 100 and standard deviation of 15. What should the client keep in mind about the interpretation of the variance of the intercept and covariance of the slope-intercept?<br />
<br />
=== Gilbert === <br />
In chapter 5, they talk about hierarchical linear model where fixed effects and random effects are taken into consideration. Discuss a clear simple example in class which shows both effects and give interpretations of each of the coefficients and their use in real life.<br />
=== Daniela === <br />
In chapter 5, they talked about mostly about the two-level nesting structure. Can we have a bigger example with at least 4 levels that includes the random intercept and slope and how to apply this into R coding?<br />
=== Phil ===<br />
<br />
Exluding fixed effects that are non-significant is common practice in regression analyses, and Snijders and Bosker follow this practice when simplifying the model in Table 5.3 to the model found in Table 5.4. While this practice is used to help make the model more parsimonious it can ignore the joint effect that these variables have on the model as a whole. Discuss alternative criteria that one should explore when determining whether a predictor should be excluded from the model.<br />
<br />
=== Ryan ===<br />
=== and others === <br />
== Chapter 6 ==<br />
=== Alex === <br />
=== Matthew === <br />
=== Philip === <br />
=== Qiong === <br />
=== Carrie === <br />
=== Gilbert === <br />
=== Daniela === <br />
=== Phil === <br />
=== Ryan ===<br />
=== and others === <br />
== Chapter 7 ==<br />
=== Alex === <br />
=== Matthew === <br />
=== Philip === <br />
=== Qiong === <br />
=== Carrie === <br />
=== Gilbert === <br />
=== Daniela === <br />
=== Phil === <br />
=== Ryan ===<br />
=== and others ===</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/A2MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/A22012-06-01T23:23:24Z<p>Smithce: </p>
<hr />
<div>=== Assignment 2 ===<br />
<br />
: 1. [10] A basis for <math>\mathbb{R}^p</math> that is a conjugate basis with respect to a positive definite matrix <math>M</math> is a sequence of vectors <math>x_1, x_2, ... , x_p</math> in <math>\mathbb{R}^p</math> such that <math>x'_i M x_i = 1</math> and <math>x'_i M x_j = 0</math> if <math>i \neq j</math>. Show that the columns of a non-singular matrix <math>A</math> form a conjugate basis with respect to <math>\Sigma^{-1}</math> if <math>\Sigma = AA'</math>. Note that a conjugate basis is merely an orthogonal basis with respect to the metric defined by <math>||x||^2 = x' \Sigma^{-1}x</math>. <br />
<br />
: 2. [10] We will call a "square root" of a square matrix <math>M</math> any square matrix <math>A</math> such that <math>M = AA'</math>. Show that a square matrix has a square root if and only if it is a variance matrix. <br />
<br />
: 3. [10] Write a function in R that computes a square root of a variance matrix M. Use the 'eigen' function. [Bonus: 2] Get your function to give an informative error message if M does not have a square root for some reason.<br />
<br />
: 4. [10] Using the function in 3, write a multivariate normal random number generator. Write it to parallel the univariate 'rnorm'. The univariate 'rnorm' takes three arguments: n, mean and sd. Consider writing your 'rmvnorm' so the third argument, if given, must be named either 'var' or 'sd' (depending on whether the user is giving a variance or the square root of a variance as input) to avoid confusion with the univariate generator. The default could be the identity -- which doesn't need to be distinguished as 'var' or as 'sd'.<br />
<br />
: 5. [10] Write a simple 'lmfit' function that calculates least squares regression coefficients using an algorithm based on the svd. Ideally, design the function so it takes a formula and a data frame as arguments, e.g. lmfit( y ~ x1 + x2, dd). You can generate the model matrix using the 'model.matrix' function and extract the response using the first column of the model.frame command. <br />
<br />
: 6. [10] Consider a <math>2 \times 2</math> variance matrix <math>\Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22}\end{bmatrix}</math> for a random vector <math>\begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix}</math>. Verify that the Cholesky matrix <math>C = \begin{bmatrix} \sigma_{11}^{1/2} & 0 \\ \sigma_{21}/ \sigma_{11}^{1/2}& \sqrt{\sigma_{22} - \sigma_{12}^2 / \sigma_{11}}\end{bmatrix}</math> is a square root of <math>\Sigma</math>.<br />
:: Show that the Cholesky matrix can be written as <math>\begin{bmatrix} \sigma_1 & 0 \\ \beta_{21} \sigma_1 & \sigma_{2 \cdot 1}\end{bmatrix}</math> where <math>\beta_{21}</math> is the regression coefficient of <math>Y_2</math> on <math>Y_1</math>.<br />
:: Draw a concentration (or data) ellipse and indicate the interpretation of the vectors defined by the columns of <math>C</math> relative to the ellipse.<br />
<br />
<br />
<br />
: 7. [10] Show that a non-singular <math>2 \times 2</math> variance matrix, <math>\Sigma</math> can be factored so that <math>\Sigma = AA'</math> with <math>A</math> an upper triangular matrix [in contrast with problem 6 where the matrix is lower triangular]. Explain the interpretation of the elements of this matrix as in question 6. <br />
<br />
: 8. [20] Generate 100 observations for three variables <math>Y</math>, <math>X</math> and <math>Z</math> so that in the regression of <math>Y</math> on both <math>X</math> and <math>Z</math> neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 is rejected at the 1% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for <math>X</math> and <math>Z</math> and appropriate confidence ellipses for their two regression coefficients. What does this example illustrate about the appropriatenes of scanning regression output for significant p-values and concluding that nothing is happening if none of the p-value achieve significance?<br />
<br />
: 9. [20] Generate 100 observations for three variables <math>Y</math>, <math>X</math> and <math>Z</math> so that in the separate simple regressions of <math>Y</math> on each of <math>X</math> and <math>Z</math> neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 in a multiple regression of <math>Y</math> on both <math>X</math> and <math>Z</math> is rejected at the 5% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for <math>X</math> and <math>Z</math> and appropriate confidence ellipses for their two regression coefficients. Explain the relationship between the ellipses and the phenomenon exhibited in this problem. What does this example illustrate about the appropriatenes of forward stepwise regression to identify a suitable model to predict <math>Y</math> using both <math>X</math> and <math>Z</math>?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/A2MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/A22012-06-01T23:22:34Z<p>Smithce: </p>
<hr />
<div>=== Assignment 2 ===<br />
<br />
: 1. [10] A basis for <math>\mathbb{R}^p</math> that is a conjugate basis with respect to a positive definite matrix <math>M</math> is a sequence of vectors <math>x_1, x_2, ... , x_p</math> in <math>\mathbb{R}^p</math> such that <math>x'_i M x_i = 1</math> and <math>x'_i M x_j = 0</math> if <math>i \neq j</math>. Show that the columns of a non-singular matrix <math>A</math> form a conjugate basis with respect to <math>\Sigma^{-1}</math> if <math>\Sigma = AA'</math>. Note that a conjugate basis is merely an orthogonal basis with respect to the metric defined by <math>||x||^2 = x' \Sigma^{-1}x</math>. <br />
<br />
: 2. [10] We will call a "square root" of a square matrix <math>M</math> any square matrix <math>A</math> such that <math>M = AA'</math>. Show that a square matrix has a square root if and only if it is a variance matrix. <br />
<br />
: 3. [10] Write a function in R that computes a square root of a variance matrix M. Use the 'eigen' function. [Bonus: 2] Get your function to give an informative error message if M does not have a square root for some reason.<br />
<br />
: 4. [10] Using the function in 3, write a multivariate normal random number generator. Write it to parallel the univariate 'rnorm'. The univariate 'rnorm' takes three arguments: n, mean and sd. Consider writing your 'rmvnorm' so the third argument, if given, must be named either 'var' or 'sd' (depending on whether the user is giving a variance or the square root of a variance as input) to avoid confusion with the univariate generator. The default could be the identity -- which doesn't need to be distinguished as 'var' or as 'sd'.<br />
<br />
: 5. [10] Write a simple 'lmfit' function that calculates least squares regression coefficients using an algorithm based on the svd. Ideally, design the function so it takes a formula and a data frame as arguments, e.g. lmfit( y ~ x1 + x2, dd). You can generate the model matrix using the 'model.matrix' function and extract the response using the first column of the model.frame command. <br />
<br />
: 6. [10] Consider a <math>2 \times 2</math> variance matrix <math>\Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22}\end{bmatrix}</math> for a random vector <math>\begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix}</math>. Verify that the Cholesky matrix <math>C = \begin{bmatrix} \sigma_{11}^{1/2} & 0 \\ \sigma_{21}/ \sigma_{11}^{1/2}& \sqrt{\sigma_{22} - \sigma_{12}^2 / \sigma_{11}}\end{bmatrix}</math> is a square root of <math>\Sigma</math>.<br />
:: Show that the Cholesky matrix can be written as <math>\begin{bmatrix} \sigma_1 & 0 \\ \beta_{21} \sigma_1 & \sigma_{2 \cdot 1}\end{bmatrix}</math> where <math>\beta_{21}</math> is the regression coefficient of <math>Y_2</math> on <math>Y_1</math>.<br />
:: Draw a concentration (or data) ellipse and indicate the interpretation of the vectors defined by the columns of <math>C</math> relative to the ellipse.<br />
<br />
: 7. [10] Show that a non-singular <math>2 \times 2</math> variance matrix, <math>\Sigma</math> can be factored so that <math>\Sigma = AA'</math> with <math>A</math> an upper triangular matrix [in contrast with problem 6 where the matrix is lower triangular]. Explain the interpretation of the elements of this matrix as in question 6. <br />
<br />
: 8. [20] Generate 100 observations for three variables <math>Y</math>, <math>X</math> and <math>Z</math> so that in the regression of <math>Y</math> on both <math>X</math> and <math>Z</math> neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 is rejected at the 1% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for <math>X</math> and <math>Z</math> and appropriate confidence ellipses for their two regression coefficients. What does this example illustrate about the appropriatenes of scanning regression output for significant p-values and concluding that nothing is happening if none of the p-value achieve significance?<br />
<br />
: 9. [20] Generate 100 observations for three variables <math>Y</math>, <math>X</math> and <math>Z</math> so that in the separate simple regressions of <math>Y</math> on each of <math>X</math> and <math>Z</math> neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 in a multiple regression of <math>Y</math> on both <math>X</math> and <math>Z</math> is rejected at the 5% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for <math>X</math> and <math>Z</math> and appropriate confidence ellipses for their two regression coefficients. Explain the relationship between the ellipses and the phenomenon exhibited in this problem. What does this example illustrate about the appropriatenes of forward stepwise regression to identify a suitable model to predict <math>Y</math> using both <math>X</math> and <math>Z</math>?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/A2MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/A22012-06-01T23:22:00Z<p>Smithce: Created page with "=== Assignment 2 === Due June 7 before class You must write up your own work based on your own understanding but you can do anything you want to develop your understanding. : 1..."</p>
<hr />
<div>=== Assignment 2 ===<br />
Due June 7 before class<br />
<br />
You must write up your own work based on your own understanding but you can do anything you want to develop your understanding.<br />
<br />
: 1. [10] A basis for <math>\mathbb{R}^p</math> that is a conjugate basis with respect to a positive definite matrix <math>M</math> is a sequence of vectors <math>x_1, x_2, ... , x_p</math> in <math>\mathbb{R}^p</math> such that <math>x'_i M x_i = 1</math> and <math>x'_i M x_j = 0</math> if <math>i \neq j</math>. Show that the columns of a non-singular matrix <math>A</math> form a conjugate basis with respect to <math>\Sigma^{-1}</math> if <math>\Sigma = AA'</math>. Note that a conjugate basis is merely an orthogonal basis with respect to the metric defined by <math>||x||^2 = x' \Sigma^{-1}x</math>. <br />
<br />
: 2. [10] We will call a "square root" of a square matrix <math>M</math> any square matrix <math>A</math> such that <math>M = AA'</math>. Show that a square matrix has a square root if and only if it is a variance matrix. <br />
<br />
: 3. [10] Write a function in R that computes a square root of a variance matrix M. Use the 'eigen' function. [Bonus: 2] Get your function to give an informative error message if M does not have a square root for some reason.<br />
<br />
: 4. [10] Using the function in 3, write a multivariate normal random number generator. Write it to parallel the univariate 'rnorm'. The univariate 'rnorm' takes three arguments: n, mean and sd. Consider writing your 'rmvnorm' so the third argument, if given, must be named either 'var' or 'sd' (depending on whether the user is giving a variance or the square root of a variance as input) to avoid confusion with the univariate generator. The default could be the identity -- which doesn't need to be distinguished as 'var' or as 'sd'.<br />
<br />
: 5. [10] Write a simple 'lmfit' function that calculates least squares regression coefficients using an algorithm based on the svd. Ideally, design the function so it takes a formula and a data frame as arguments, e.g. lmfit( y ~ x1 + x2, dd). You can generate the model matrix using the 'model.matrix' function and extract the response using the first column of the model.frame command. <br />
<br />
: 6. [10] Consider a <math>2 \times 2</math> variance matrix <math>\Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22}\end{bmatrix}</math> for a random vector <math>\begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix}</math>. Verify that the Cholesky matrix <math>C = \begin{bmatrix} \sigma_{11}^{1/2} & 0 \\ \sigma_{21}/ \sigma_{11}^{1/2}& \sqrt{\sigma_{22} - \sigma_{12}^2 / \sigma_{11}}\end{bmatrix}</math> is a square root of <math>\Sigma</math>.<br />
:: Show that the Cholesky matrix can be written as <math>\begin{bmatrix} \sigma_1 & 0 \\ \beta_{21} \sigma_1 & \sigma_{2 \cdot 1}\end{bmatrix}</math> where <math>\beta_{21}</math> is the regression coefficient of <math>Y_2</math> on <math>Y_1</math>.<br />
:: Draw a concentration (or data) ellipse and indicate the interpretation of the vectors defined by the columns of <math>C</math> relative to the ellipse.<br />
<br />
: 7. [10] Show that a non-singular <math>2 \times 2</math> variance matrix, <math>\Sigma</math> can be factored so that <math>\Sigma = AA'</math> with <math>A</math> an upper triangular matrix [in contrast with problem 6 where the matrix is lower triangular]. Explain the interpretation of the elements of this matrix as in question 6. <br />
<br />
: 8. [20] Generate 100 observations for three variables <math>Y</math>, <math>X</math> and <math>Z</math> so that in the regression of <math>Y</math> on both <math>X</math> and <math>Z</math> neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 is rejected at the 1% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for <math>X</math> and <math>Z</math> and appropriate confidence ellipses for their two regression coefficients. What does this example illustrate about the appropriatenes of scanning regression output for significant p-values and concluding that nothing is happening if none of the p-value achieve significance?<br />
<br />
: 9. [20] Generate 100 observations for three variables <math>Y</math>, <math>X</math> and <math>Z</math> so that in the separate simple regressions of <math>Y</math> on each of <math>X</math> and <math>Z</math> neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 in a multiple regression of <math>Y</math> on both <math>X</math> and <math>Z</math> is rejected at the 5% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for <math>X</math> and <math>Z</math> and appropriate confidence ellipses for their two regression coefficients. Explain the relationship between the ellipses and the phenomenon exhibited in this problem. What does this example illustrate about the appropriatenes of forward stepwise regression to identify a suitable model to predict <math>Y</math> using both <math>X</math> and <math>Z</math>?</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithceMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce2012-06-01T23:21:26Z<p>Smithce: </p>
<hr />
<div>==About Me==<br />
I am a PhD student in Psychology in the Quantitative Methods Area. I completed my Masters in Psychology degree at York studying depth perception and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background! This year I had the privilege of working as a consultant with the Statistical Consulting Service, which was a fabulous experience. <br />
<br />
I have experience working with R, Matlab and SPSS. I am especially fond of R.<br />
<br />
== '''Discussion Questions''' ==<br />
=== Chapter 2 ===<br />
* I once attended an HLM workshop at a large Education conference. A fellow attendee was skeptical of the entire enterprise. In the example provided the within-group, between-group and pooled effects for socio-economic status were all positively and statistically significantly related with the outcome measure. His opinion was, that since all three effects were in the same direction and significant, why bother with the extra complexity, since it would simply confuse the pants off his trustees anyway. How would you respond?<br />
* '''Post:''' Consider the macro-micro-micro-macro causal chain (pg. 12, Figure 2.7). What would happen if one were to model this simply as a macro-macro (W -> Z) model, omitting the intermediate variables? What if the true relationship was micro-macro-macro-micro? The potential for errors in causal assumptions have always bothered me, especially in SEM type analyses, but it still applies here. In Psychology we are attributing some meaning to a variable, but it could easily be read another way. Here's a convoluted example: The researcher assumes: Teacher Disciplinarianism -> Student Behaviour -> Student Success -> Teacher Stress. But perhaps the pathways are different, and in fact: Student Behaviour -> Teacher Disciplinarianism -> Teacher Stress -> Student Success.<br />
<br />
=== Chapter 3 ===<br />
* In the course we use the language of "contextual", "compositional", "between", "within" and "pooled" effects. Identify each from the example on page 28-29 and on the graph Figure 3.4.<br />
<br />
=== Chapter 4 ===<br />
* Describe the relative advantages and disadvantages of REML and ML estimation. When should you choose one over the other?<br />
* Write an R script to simulate appropriate data and fit Models 3 and 4 from page 70.<br />
<br />
<br />
=== SPIDA Models ===<br />
*[[/Model1|Model 1 - fit]]<br />
*[[/Model2|Model 2 with Contextual Variable - fitc]]<br />
*[[/Model3|Model 3 Centered Within Group and Contextual Variable - fitcd]]<br />
*[[/Model4|Model 4 Centered Within Group RE - fitca]]<br />
*[[/Model5|Model 5 Minority and ses - fit]]<br />
*[[/A2]]</div>Smithcehttp://scs.math.yorku.ca/index.php/SPIDA_2012:_Mixed_Models_with_R/LinksSPIDA 2012: Mixed Models with R/Links2012-05-31T13:52:15Z<p>Smithce: </p>
<hr />
<div>* [http://www.rseek.org/ R Search]<br />
* [http://www.ted.com/speakers/hans_rosling.html Hans Rosling's Ted Talks]<br />
** [http://www.ted.com/talks/hans_rosling_religions_and_babies.html Hans Rosling's Asymptotic Global Population]</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model1MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model12012-05-29T19:27:21Z<p>Smithce: </p>
<hr />
<div>=== Model 1 ===<br />
<br />
<u> Level 1 Model </u><br />
:<math>mathach_{ij} = {\color{Red}\beta_{0j}} + {\color{Blue}\beta_{1j}}ses_{ij} + r_{ij} </math><br />
<br />
<u> Level 2 Model (Between School Model) </u><br />
:<math>{\color{Red}\beta_{0j}} = {\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}u_{0j}} </math><br />
:<math>{\color{Blue}\beta_{1j}} = {\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}Sector_j} + {\color{Blue}u_{1j}} </math><br />
<br />
<u> Combined Model (by substitution) </u><br />
:<math>mathach_{ij} = {\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}u_{0j}} + ({\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}Sector_j} + {\color{Blue}u_{1j}})ses_{ij} + r_{ij} </math><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_jses_{ij}</math><br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fit <- lme( mathach ~ ses * Sector, dd, random = ~ 1 + ses | id, control = list(msMaxIter=200, msVerbose=T))<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik <br />
|-<br />
| 23914||23964||-11949 <br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||2.151 <font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.355 <font color=DarkOrange><sup>2</sup>||0.973 <font color=DarkOrange><sup>3</sup><br />
|-<br />
| Residual||6.075<font color=DarkOrange><sup>4</sup>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses * Sector<br><br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.98<font color=DarkOrange><sup>5</sup></font>||0.387||3602||36.1||0<br />
|-<br />
| ses||1.67<font color=DarkOrange><sup>6</sup></font>||0.228||3602||7.3||0<br />
|-<br />
| SectorPublic||-2.37<font color=DarkOrange><sup>7</sup></font>||0.526||78||-4.5||0<br />
|-<br />
| ses:SectorPublic||1.39<font color=DarkOrange><sup>8</sup></font>||0.307||3602||4.5||0<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses||SctrPb<br />
|-<br />
| ses||0.204||||<br />
|-<br />
| SectorPublic||-0.736||-0.15||<br />
|-<br />
| ses:SectorPublic||-0.152||-0.744||0.252<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.0761||-0.7378||0.0246||0.7565||2.7849<br />
|}<br />
<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 2.151^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.355^2</math><br />
:3. <math>Correlation({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.973</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.075^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.98</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.67</math> Slope of ses for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -2.37</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.98 - 2.37 = 11.61)<br />
:8 <math>{\color{Blue}\gamma_{11}} = 1.39</math> Change in Slope for Public schools (e.g. ses slope for Public = 1.67 + 1.39 = 3.06)</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithceMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce2012-05-29T18:47:37Z<p>Smithce: </p>
<hr />
<div>==About Me==<br />
I am a PhD student in Psychology in the Quantitative Methods Area. I completed my Masters in Psychology degree at York studying depth perception and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background! This year I had the privilege of working as a consultant with the Statistical Consulting Service, which was a fabulous experience. <br />
<br />
I have experience working with R, Matlab and SPSS. I am especially fond of R.<br />
<br />
== '''Discussion Questions''' ==<br />
=== Chapter 2 ===<br />
* I once attended an HLM workshop at a large Education conference. A fellow attendee was skeptical of the entire enterprise. In the example provided the within-group, between-group and pooled effects for socio-economic status were all positively and statistically significantly related with the outcome measure. His opinion was, that since all three effects were in the same direction and significant, why bother with the extra complexity, since it would simply confuse the pants off his trustees anyway. How would you respond?<br />
* '''Post:''' Consider the macro-micro-micro-macro causal chain (pg. 12, Figure 2.7). What would happen if one were to model this simply as a macro-macro (W -> Z) model, omitting the intermediate variables? What if the true relationship was micro-macro-macro-micro? The potential for errors in causal assumptions have always bothered me, especially in SEM type analyses, but it still applies here. In Psychology we are attributing some meaning to a variable, but it could easily be read another way. Here's a convoluted example: The researcher assumes: Teacher Disciplinarianism -> Student Behaviour -> Student Success -> Teacher Stress. But perhaps the pathways are different, and in fact: Student Behaviour -> Teacher Disciplinarianism -> Teacher Stress -> Student Success.<br />
<br />
=== Chapter 3 ===<br />
* In the course we use the language of "contextual", "compositional", "between", "within" and "pooled" effects. Identify each from the example on page 28-29 and on the graph Figure 3.4.<br />
<br />
=== Chapter 4 ===<br />
* Describe the relative advantages and disadvantages of REML and ML estimation. When should you choose one over the other?<br />
* Write an R script to simulate appropriate data and fit Models 3 and 4 from page 70.<br />
<br />
<br />
=== SPIDA Models ===<br />
*[[/Model1|Model 1 - fit]]<br />
*[[/Model2|Model 2 with Contextual Variable - fitc]]<br />
*[[/Model3|Model 3 Centered Within Group and Contextual Variable - fitcd]]<br />
*[[/Model4|Model 4 Centered Within Group RE - fitca]]<br />
*[[/Model5|Model 5 Minority and ses - fit]]</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model2MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model22012-05-29T18:36:34Z<p>Smithce: </p>
<hr />
<div>=== Model 2 with Contextual Variable ===<br />
<br />
We can easily create a new variable, ses.m, which we assign the mean ses for the particular school the student is comes from:<br />
dd$ses.m <- with( dd, cvar( ses, id))<br />
<br />
<br />
<u> Level 1 Model </u><br />
:<math>mathach_{ij} = {\color{Red}\beta_{0j}} + {\color{Blue}\beta_{1j}}ses_{ij} + r_{ij} </math><br />
<br />
<u> Level 2 Model (Between School Model) </u><br />
:<math>{\color{Red}\beta_{0j}} = {\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Red}u_{0j}} </math><br />
:<math>{\color{Blue}\beta_{1j}} = {\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}Sector_j} + {\color{Blue}u_{1j}} </math><br />
<br />
<u> Combined Model (by substitution) </u><br />
:<math>mathach_{ij} = {\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Red}u_{0j}} + ({\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}Sector_j} + {\color{Blue}u_{1j}})ses_{ij} + r_{ij} </math><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_jses_{ij}</math><br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fitc <- lme( mathach ~ ses * Sector + ses.m, dd, random = ~ 1 + ses | id )<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23891.85||23947.74||-11936.92<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.776<font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.365<font color=DarkOrange><sup>2</sup></font>||0.651<font color=DarkOrange><sup>3</sup></font><br />
|-<br />
| Residual||6.074<font color=DarkOrange><sup>4</sup></font>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses * Sector + cvar(ses, id)<br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.809<font color=DarkOrange><sup>5</sup></font>||0.332||3602||41.592||0.00E+00<br />
|-<br />
| ses||1.517<font color=DarkOrange><sup>6</sup></font>||0.231||3602||6.564||0.00E+00<br />
|-<br />
| SectorPublic||-1.779<font color=DarkOrange><sup>7</sup></font>||0.464||77||-3.836||3.00E-04<br />
|-<br />
| ses.m||3.091<font color=DarkOrange><sup>8</sup></font>||0.590||77||5.241||0.00E+00<br />
|-<br />
| ses:SectorPublic||1.366<font color=DarkOrange><sup>9</sup></font>||0.3057||3602||4.469||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses||SctrPb||cv(,i)<br />
|-<br />
| ses||0.134||||||<br />
|-<br />
| SectorPublic||-0.733||-0.126||||<br />
|-<br />
| ses.m||-0.1||-0.183||0.238||<br />
|-<br />
| ses:SectorPublic||-0.088||-0.733||0.172||0.011<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.08703974||-0.7342756||0.02854529||0.74982231||2.85516252<br />
|}<br />
<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,0,0,1,0),<br />
"Compositional" = c( 0,1,0,1,0)))<br />
wald ( fitc , L )<br />
<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||43.03857||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.517<font color=DarkOrange><sup>6</sup></font>||0.231||3602||6.564||<.00001||1.064||1.970<br />
|-<br />
| Contextual||3.091<font color=DarkOrange><sup>8</sup></font>||0.590||77||5.241||<.00001||1.917||4.265<br />
|-<br />
| Compositional||4.608<font color=DarkOrange><sup>10</sup></font>||0.593||77||7.776||<.00001||3.428||5.788<br />
|}<br />
<br />
<br />
L <- list( "Within school effect of ses" =<br />
rbind( "Catholic" = c(0,1,0,0,0),<br />
"Public" = c(0,1,0,0,1),<br />
"Pub-Cath" = c(0,0,0,0,1))<br />
)<br />
<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Within school effect of ses||2||3602||114.536||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Catholic||1.517<font color=DarkOrange><sup>6</sup>||0.231||3602||6.564||<.00001||1.064||1.970<br />
|-<br />
| Public||2.883<font color=DarkOrange><sup>11</sup></font>||0.208||3602||13.855||<.00001||2.475||3.291<br />
|-<br />
| Pub-Cath||1.366<font color=DarkOrange><sup>9</sup></font>||0.306||3602||4.469||1.00E-05||0.767||1.965<br />
|}<br />
<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 1.776^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.365^2</math><br />
:3. <math>Cov({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.651</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.074^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.809</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.517</math> Within school effect of ses for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -1.779</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.809 -1.779 = 12.03)<br />
:8 <math>{\color{Red}\gamma_{02}} = 3.091</math> Increase in mathach associated with 1 unit increase in school mean ses - Contextual effect<br />
:9 <math>{\color{Blue}\gamma_{11}} = 1.366</math> Change in within-school effect for Public schools (e.g. ses slope for Public = 1.517 + 1.366 = 2.883). Equivalent interpretation: The difference between the slope of Public schools compared to Catholic schools.<br />
:10 <math>{\color{Blue}\gamma_{10}} + {\color{Red}\gamma_{02}} = 1.517 + 3.091 = 4.608 </math> Between school effect for Catholic schools (e.g. This is the difference going from a student with ses = X in a school with mean ses = Y to a student with ses = X + 1 in a school with mean ses = Y + 1)<br />
:11 Continued from 9. Within school effect for Public Schools is <math>{\color{Blue}\gamma_{11}} + {\color{Blue}\gamma_{10}} = 1.366 + 1.517 </math></div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model2MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model22012-05-29T18:35:05Z<p>Smithce: </p>
<hr />
<div>=== Model 2 with Contextual Variable ===<br />
<br />
We can easily create a new variable, ses.m, which we assign the mean ses for the particular school the student is comes from:<br />
dd$ses.m <- with( dd, cvar( ses, id))<br />
<br />
<br />
<u> Level 1 Model </u><br />
:<math>mathach_{ij} = {\color{Red}\beta_{0j}} + {\color{Blue}\beta_{1j}}ses_{ij} + r_{ij} </math><br />
<br />
<u> Level 2 Model (Between School Model) </u><br />
:<math>{\color{Red}\beta_{0j}} = {\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Red}u_{0j}} </math><br />
:<math>{\color{Blue}\beta_{1j}} = {\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}Sector_j} + {\color{Blue}u_{1j}} </math><br />
<br />
<u> Combined Model (by substitution) </u><br />
:<math>mathach_{ij} = {\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Red}u_{0j}} + ({\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}Sector_j} + {\color{Blue}u_{1j}})ses_{ij} + r_{ij} </math><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_jses_{ij}</math><br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fitc <- lme( mathach ~ ses * Sector + ses.m, dd, random = ~ 1 + ses | id )<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23891.85||23947.74||-11936.92<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.776<font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.365<font color=DarkOrange><sup>2</sup></font>||0.651<font color=DarkOrange><sup>3</sup></font><br />
|-<br />
| Residual||6.074<font color=DarkOrange><sup>4</sup></font>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses * Sector + cvar(ses, id)<br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.809<font color=DarkOrange><sup>5</sup></font>||0.332||3602||41.592||0.00E+00<br />
|-<br />
| ses||1.517<font color=DarkOrange><sup>6</sup></font>||0.231||3602||6.564||0.00E+00<br />
|-<br />
| SectorPublic||-1.779<font color=DarkOrange><sup>7</sup></font>||0.464||77||-3.836||3.00E-04<br />
|-<br />
| ses.m||3.091<font color=DarkOrange><sup>8</sup></font>||0.590||77||5.241||0.00E+00<br />
|-<br />
| ses:SectorPublic||1.366<font color=DarkOrange><sup>9</sup></font>||0.3057||3602||4.469||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses||SctrPb||cv(,i)<br />
|-<br />
| ses||0.134||||||<br />
|-<br />
| SectorPublic||-0.733||-0.126||||<br />
|-<br />
| ses.m||-0.1||-0.183||0.238||<br />
|-<br />
| ses:SectorPublic||-0.088||-0.733||0.172||0.011<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.08703974||-0.7342756||0.02854529||0.74982231||2.85516252<br />
|}<br />
<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,0,0,1,0),<br />
"Compositional" = c( 0,1,0,1,0)))<br />
wald ( fitc , L )<br />
<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||43.03857||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.517<font color=DarkOrange><sup>6</sup></font>||0.231||3602||6.564||<.00001||1.064||1.970<br />
|-<br />
| Contextual||3.091<font color=DarkOrange><sup>8</sup></font>||0.590||77||5.241||<.00001||1.917||4.265<br />
|-<br />
| Compositional||4.608<font color=DarkOrange><sup>10</sup></font>||0.593||77||7.776||<.00001||3.428||5.788<br />
|}<br />
<br />
<br />
L <- list( "Within school effect of ses" =<br />
rbind( "Catholic" = c(0,1,0,0,0),<br />
"Public" = c(0,1,0,0,1),<br />
"Pub-Cath" = c(0,0,0,0,1))<br />
)<br />
<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Within school effect of ses||2||3602||114.536||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Catholic||1.517<font color=DarkOrange><sup>6</sup>||0.231||3602||6.564||<.00001||1.064||1.970<br />
|-<br />
| Public||2.883<font color=DarkOrange><sup>11</sup></font>||0.208||3602||13.855||<.00001||2.475||3.291<br />
|-<br />
| Pub-Cath||1.366<font color=DarkOrange><sup>9</sup></font>||0.306||3602||4.469||1.00E-05||0.767||1.965<br />
|}<br />
<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 1.776^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.365^2</math><br />
:3. <math>Cov({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.651</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.074^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.809</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.517</math> Within school effect of ses for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -1.779</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.809 -1.779 = 12.03)<br />
:8 <math>{\color{Red}\gamma_{02}} = 3.091</math> Increase in mathach associated with 1 unit increase in school mean ses - Contextual effect<br />
:9 <math>{\color{Blue}\gamma_{11}} = 1.366</math> Change in within-school effect for Public schools (e.g. ses slope for Public = 1.517 + 1.366 = 2.883)<br />
:10 <math>{\color{Blue}\gamma_{10}} + {\color{Red}\gamma_{02}} = 1.517 + 3.091 = 4.608 </math> Between school effect for Catholic schools (e.g. This is the difference going from a student with ses = X in a school with mean ses = Y to a student with ses = X + 1 in a school with mean ses = Y + 1)<br />
:11 Continued from 9. Within school effect for Public Schools is <math>{\color{Blue}\gamma_{11}} + {\color{Blue}\gamma_{10}} = 1.366 + 1.517 </math></div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model2MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model22012-05-29T18:27:23Z<p>Smithce: </p>
<hr />
<div>=== Model 2 with Contextual Variable ===<br />
<br />
We can easily create a new variable, ses.m, which we assign the mean ses for the particular school the student is comes from:<br />
dd$ses.m <- with( dd, cvar( ses, id))<br />
<br />
<br />
<u> Level 1 Model </u><br />
:<math>mathach_{ij} = {\color{Red}\beta_{0j}} + {\color{Blue}\beta_{1j}}ses_{ij} + r_{ij} </math><br />
<br />
<u> Level 2 Model (Between School Model) </u><br />
:<math>{\color{Red}\beta_{0j}} = {\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Red}u_{0j}} </math><br />
:<math>{\color{Blue}\beta_{1j}} = {\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}Sector_j} + {\color{Blue}u_{1j}} </math><br />
<br />
<u> Combined Model (by substitution) </u><br />
:<math>mathach_{ij} = {\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Red}u_{0j}} + ({\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}Sector_j} + {\color{Blue}u_{1j}})ses_{ij} + r_{ij} </math><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_jses_{ij}</math><br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fitc <- lme( mathach ~ ses * Sector + ses.m, dd, random = ~ 1 + ses | id )<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23891.85||23947.74||-11936.92<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.776<font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.365<font color=DarkOrange><sup>2</sup></font>||0.651<font color=DarkOrange><sup>3</sup></font><br />
|-<br />
| Residual||6.074<font color=DarkOrange><sup>4</sup></font>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses * Sector + cvar(ses, id)<br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.809<font color=DarkOrange><sup>5</sup></font>||0.332||3602||41.592||0.00E+00<br />
|-<br />
| ses||1.517<font color=DarkOrange><sup>6</sup></font>||0.231||3602||6.564||0.00E+00<br />
|-<br />
| SectorPublic||-1.779<font color=DarkOrange><sup>7</sup></font>||0.464||77||-3.836||3.00E-04<br />
|-<br />
| ses.m||3.091<font color=DarkOrange><sup>8</sup></font>||0.590||77||5.241||0.00E+00<br />
|-<br />
| ses:SectorPublic||1.366<font color=DarkOrange><sup>9</sup></font>||0.3057||3602||4.469||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses||SctrPb||cv(,i)<br />
|-<br />
| ses||0.134||||||<br />
|-<br />
| SectorPublic||-0.733||-0.126||||<br />
|-<br />
| ses.m||-0.1||-0.183||0.238||<br />
|-<br />
| ses:SectorPublic||-0.088||-0.733||0.172||0.011<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.08703974||-0.7342756||0.02854529||0.74982231||2.85516252<br />
|}<br />
<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,0,0,1,0),<br />
"Compositional" = c( 0,1,0,1,0)))<br />
wald ( fitc , L )<br />
<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||43.03857||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.517<font color=DarkOrange><sup>6</sup></font>||0.231||3602||6.564||<.00001||1.064||1.970<br />
|-<br />
| Contextual||3.091<font color=DarkOrange><sup>8</sup></font>||0.590||77||5.241||<.00001||1.917||4.265<br />
|-<br />
| Compositional||4.608<font color=DarkOrange><sup>10</sup></font>||0.593||77||7.776||<.00001||3.428||5.788<br />
|}<br />
<br />
<br />
L <- list( "Within school effect of ses" =<br />
rbind( "Catholic" = c(0,1,0,0,0),<br />
"Public" = c(0,1,0,0,1),<br />
"Pub-Cath" = c(0,0,0,0,1))<br />
)<br />
<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Within school effect of ses||2||3602||114.536||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Catholic||1.517||0.231||3602||6.564||<.00001||1.064||1.970<br />
|-<br />
| Public||2.883||0.208||3602||13.855||<.00001||2.475||3.291<br />
|-<br />
| Pub-Cath||1.366||0.306||3602||4.469||1.00E-05||0.767||1.965<br />
|}<br />
<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 1.776^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.365^2</math><br />
:3. <math>Cov({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.651</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.074^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.809</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.517</math> Within school effect of ses for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -1.779</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.809 -1.779 = 12.03)<br />
:8 <math>{\color{Red}\gamma_{02}} = 3.091</math> Increase in mathach associated with 1 unit increase in school mean ses - Contextual effect<br />
:9 <math>{\color{Blue}\gamma_{11}} = 1.366</math> Change in within-school effect for Public schools (e.g. ses slope for Public = 1.517 + 1.366 = 2.883)<br />
:10 <math>{\color{Blue}\gamma_{10}} + {\color{Red}\gamma_{02}} = 1.517 + 3.091 = 4.608 </math> Between school effect for Catholic schools (e.g. This is the difference going from a student with ses = X in a school with mean ses = Y to a student with ses = X + 1 in a school with mean ses = Y + 1)</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithceMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce2012-05-29T16:12:25Z<p>Smithce: </p>
<hr />
<div>==About Me==<br />
I am a PhD student in Psychology in the Quantitative Methods Area. I completed my Masters in Psychology degree at York studying depth perception and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background! This year I had the privilege of working as a consultant with the Statistical Consulting Service, which was a fabulous experience. <br />
<br />
I have experience working with R, Matlab and SPSS. I am especially fond of R.<br />
<br />
== '''Discussion Questions''' ==<br />
=== Chapter 2 ===<br />
* I once attended an HLM workshop at a large Education conference. A fellow attendee was skeptical of the entire enterprise. In the example provided the within-group, between-group and pooled effects for socio-economic status were all positively and statistically significantly related with the outcome measure. His opinion was, that since all three effects were in the same direction and significant, why bother with the extra complexity, since it would simply confuse the pants off his trustees anyway. How would you respond?<br />
* '''Post:''' Consider the macro-micro-micro-macro causal chain (pg. 12, Figure 2.7). What would happen if one were to model this simply as a macro-macro (W -> Z) model, omitting the intermediate variables? What if the true relationship was micro-macro-macro-micro? The potential for errors in causal assumptions have always bothered me, especially in SEM type analyses, but it still applies here. In Psychology we are attributing some meaning to a variable, but it could easily be read another way. Here's a convoluted example: The researcher assumes: Teacher Disciplinarianism -> Student Behaviour -> Student Success -> Teacher Stress. But perhaps the pathways are different, and in fact: Student Behaviour -> Teacher Disciplinarianism -> Teacher Stress -> Student Success.<br />
<br />
=== Chapter 3 ===<br />
* In the course we use the language of "contextual", "compositional", "between", "within" and "pooled" effects. Identify each from the example on page 28-29 and on the graph Figure 3.4.<br />
<br />
=== Chapter 4 ===<br />
* Describe the relative advantages and disadvantages of REML and ML estimation. When should you choose one over the other?<br />
* Write an R script to simulate appropriate data and fit Models 3 and 4 from page 70.<br />
<br />
<br />
=== SPIDA Models ===<br />
*[[/Model1|Model 1 - fit]]<br />
*[[/Model2|Model 2 with Contextual Variable - fitc]]<br />
*[[/Model3|Model 3 Centered Within Group and Contextual Variable - fitcd]]<br />
*[[/Model4|Model 4 Centered Within Group RE - fitca]]</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithceMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce2012-05-29T16:01:51Z<p>Smithce: </p>
<hr />
<div>==About Me==<br />
I am a PhD student in Psychology in the Quantitative Methods Area. I completed my Masters in Psychology degree at York studying depth perception and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background! This year I had the privilege of working as a consultant with the Statistical Consulting Service, which was a fabulous experience. <br />
<br />
I have experience working with R, Matlab and SPSS. I am especially fond of R.<br />
<br />
== '''Discussion Questions''' ==<br />
=== Chapter 2 ===<br />
* I once attended an HLM workshop at a large Education conference. A fellow attendee was skeptical of the entire enterprise. In the example provided the within-group, between-group and pooled effects for socio-economic status were all positively and statistically significantly related with the outcome measure. His opinion was, that since all three effects were in the same direction and significant, why bother with the extra complexity, since it would simply confuse the pants off his trustees anyway. How would you respond?<br />
* '''Post:''' Consider the macro-micro-micro-macro causal chain (pg. 12, Figure 2.7). What would happen if one were to model this simply as a macro-macro (W -> Z) model, omitting the intermediate variables? What if the true relationship was micro-macro-macro-micro? The potential for errors in causal assumptions have always bothered me, especially in SEM type analyses, but it still applies here. In Psychology we are attributing some meaning to a variable, but it could easily be read another way. Here's a convoluted example: The researcher assumes: Teacher Disciplinarianism -> Student Behaviour -> Student Success -> Teacher Stress. But perhaps the pathways are different, and in fact: Student Behaviour -> Teacher Disciplinarianism -> Teacher Stress -> Student Success.<br />
<br />
=== Chapter 3 ===<br />
* In the course we use the language of "contextual", "compositional", "between", "within" and "pooled" effects. Identify each from the example on page 28-29 and on the graph Figure 3.4.<br />
<br />
=== Chapter 4 ===<br />
* Describe the relative advantages and disadvantages of REML and ML estimation. When should you choose one over the other?<br />
* Write an R script to simulate appropriate data and fit Models 3 and 4 from page 70.<br />
<br />
<br />
=== SPIDA Models ===<br />
*[[/Model1|Model 1]]<br />
*[[/Model2|Model 2 with Contextual Variable]]<br />
*[[/Model3|Model 3 Centered Within Group and Contextual Variable]]<br />
*[[/Model4|Model 4 Centered Within Group RE]]</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model4MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model42012-05-29T15:51:07Z<p>Smithce: </p>
<hr />
<div><u> Combined Model </u><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses.d_{ij} + r_{ij}}_{Random} </math><br />
<br />
<br />
<u> Fixed Portion of the Model </u><br />
Equivalent to FE model for Model 2 (LINK).<br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_jses_{ij}</math><br />
<br />
<br />
<u> Random Portion of the Model </u><br />
Non-equivalent RE model as compared to Model 2 (LINK).<br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses.d_{ij} + r_{ij}</math><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}(ses_{ij} - ses.m_{j}) + r_{ij}</math><br />
<br />
<br />
fitca <- lme( mathach ~ ses * Sector + ses.m, dd, random = ~ 1 + ses.d | id ) #c# Model 4 #c#<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23889.77||23945.66||-11935.88<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses.d | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.7465424||(Intr)<br />
|-<br />
| ses.d||0.6444623||0.398<br />
|-<br />
| Residual||6.0649774||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses * Sector + ses.m<br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.841||0.327||3602||42.362||0.00E+00<br />
|-<br />
| ses||1.529||0.246||3602||6.218||0.00E+00<br />
|-<br />
| SectorPublic||-1.830||0.458||77||-3.997||1.00E-04<br />
|-<br />
| ses.m||3.034||0.594||77||5.106||0.00E+00<br />
|-<br />
| ses:SectorPublic||1.354||0.324||3602||4.175||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses||SctrPb||ses.m<br />
|-<br />
| ses||0.121||||||<br />
|-<br />
| SectorPublic||-0.734||-0.114||||<br />
|-<br />
| ses.m||-0.157||-0.21||0.244||<br />
|-<br />
| ses:SectorPublic||-0.068||-0.728||0.155||0.014<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.104||-0.734||0.022||0.751||2.851<br />
|}<br />
<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,0,0,1,0),<br />
"Compositional" = c( 0,1,0,1,0)))<br />
wald( fitca,L )<br />
<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||40.835||<.00001<br />
|}<br />
<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.529||0.246||3602||6.218||<.00001||1.047||2.011<br />
|-<br />
| Contextual||3.034||0.594||77||5.106||<.00001||1.851||4.216<br />
|-<br />
| Compositional||4.562||0.593||77||7.690||<.00001||3.381||5.744<br />
|}</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model4MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model42012-05-29T15:48:09Z<p>Smithce: </p>
<hr />
<div><u> Combined Model </u><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses.d_{ij} + r_{ij}}_{Random} </math><br />
<br />
<br />
<u> Fixed Portion of the Model </u><br />
Equivalent to FE model for Model 2 (LINK).<br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_jses_{ij}</math><br />
<br />
<br />
<u> Random Portion of the Model </u><br />
Non-equivalent RE model as compared to Model 2 (LINK).<br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses.d_{ij} + r_{ij}</math><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}(ses_{ij} - ses.m_{j}) + r_{ij}</math><br />
<br />
<br />
fitca <- lme( mathach ~ ses * Sector + ses.m, dd, random = ~ 1 + ses.d | id ) #c# Model 4 #c#<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23889.77||23945.66||-11935.88<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses.d | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.7465424||(Intr)<br />
|-<br />
| ses.d||0.6444623||0.398<br />
|-<br />
| Residual||6.0649774||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses * Sector + ses.m<br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.841459||0.3267406||3602||42.36223||0.00E+00<br />
|-<br />
| ses||1.528862||0.245895||3602||6.21754||0.00E+00<br />
|-<br />
| SectorPublic||-1.830417||0.4579895||77||-3.99664||1.00E-04<br />
|-<br />
| ses.m||3.033531||0.5940779||77||5.10629||0.00E+00<br />
|-<br />
| ses:SectorPublic||1.354043||0.3243585||3602||4.17453||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses||SctrPb||ses.m<br />
|-<br />
| ses||0.121||||||<br />
|-<br />
| SectorPublic||-0.734||-0.114||||<br />
|-<br />
| ses.m||-0.157||-0.21||0.244||<br />
|-<br />
| ses:SectorPublic||-0.068||-0.728||0.155||0.014<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.10386562||-0.73369552||0.02217222||0.75083942||2.85079637<br />
|}<br />
<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,0,0,1,0),<br />
"Compositional" = c( 0,1,0,1,0)))<br />
wald( fitca,L )<br />
<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||40.83481||<.00001<br />
|}<br />
<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.528862||0.245895||3602||6.217541||<.00001||1.046755||2.010969<br />
|-<br />
| Contextual||3.033531||0.594078||77||5.106285||<.00001||1.850571||4.216492<br />
|-<br />
| Compositional||4.562394||0.593325||77||7.68954||<.00001||3.380933||5.743854<br />
|}</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model4MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model42012-05-29T15:44:26Z<p>Smithce: Created page with "<u> Combined Model </u> :<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}..."</p>
<hr />
<div><u> Combined Model </u><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses.d_{ij} + r_{ij}}_{Random} </math><br />
<br />
<br />
<u> Fixed Portion of the Model </u><br />
Equivalent to FE model for Model 2 (LINK).<br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_jses_{ij}</math><br />
<br />
<br />
<u> Random Portion of the Model </u><br />
Non-equivalent RE model as compared to Model 2 (LINK).<br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses.d_{ij} + r_{ij}</math><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}(ses_{ij} - ses.m_{j}) + r_{ij}</math><br />
<br />
<br />
fitca <- lme( mathach ~ ses * Sector + ses.m, dd, random = ~ 1 + ses.d | id ) #c# Model 4 #c#<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23889.77||23945.66||-11935.88<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses.d | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.7465424||(Intr)<br />
|-<br />
| ses.d||0.6444623||0.398<br />
|-<br />
| Residual||6.0649774||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses * Sector + ses.m<br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.841459||0.3267406||3602||42.36223||0.00E+00<br />
|-<br />
| ses||1.528862||0.245895||3602||6.21754||0.00E+00<br />
|-<br />
| SectorPublic||-1.830417||0.4579895||77||-3.99664||1.00E-04<br />
|-<br />
| ses.m||3.033531||0.5940779||77||5.10629||0.00E+00<br />
|-<br />
| ses:SectorPublic||1.354043||0.3243585||3602||4.17453||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses||SctrPb||ses.m<br />
|-<br />
| ses||0.121||||||<br />
|-<br />
| SectorPublic||-0.734||-0.114||||<br />
|-<br />
| ses.m||-0.157||-0.21||0.244||<br />
|-<br />
| ses:SectorPublic||-0.068||-0.728||0.155||0.014<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.10386562||-0.73369552||0.02217222||0.75083942||2.85079637<br />
|}<br />
<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,0,0,1,0),<br />
"Compositional" = c( 0,1,0,1,0)))<br />
wald( fitca,L )</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithceMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce2012-05-29T15:24:43Z<p>Smithce: </p>
<hr />
<div>==About Me==<br />
I am a PhD student in Psychology in the Quantitative Methods Area. I completed my Masters in Psychology degree at York studying depth perception and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background! This year I had the privilege of working as a consultant with the Statistical Consulting Service, which was a fabulous experience. <br />
<br />
I have experience working with R, Matlab and SPSS. I am especially fond of R.<br />
<br />
== '''Discussion Questions''' ==<br />
=== Chapter 2 ===<br />
* I once attended an HLM workshop at a large Education conference. A fellow attendee was skeptical of the entire enterprise. In the example provided the within-group, between-group and pooled effects for socio-economic status were all positively and statistically significantly related with the outcome measure. His opinion was, that since all three effects were in the same direction and significant, why bother with the extra complexity, since it would simply confuse the pants off his trustees anyway. How would you respond?<br />
* '''Post:''' Consider the macro-micro-micro-macro causal chain (pg. 12, Figure 2.7). What would happen if one were to model this simply as a macro-macro (W -> Z) model, omitting the intermediate variables? What if the true relationship was micro-macro-macro-micro? The potential for errors in causal assumptions have always bothered me, especially in SEM type analyses, but it still applies here. In Psychology we are attributing some meaning to a variable, but it could easily be read another way. Here's a convoluted example: The researcher assumes: Teacher Disciplinarianism -> Student Behaviour -> Student Success -> Teacher Stress. But perhaps the pathways are different, and in fact: Student Behaviour -> Teacher Disciplinarianism -> Teacher Stress -> Student Success.<br />
<br />
=== Chapter 3 ===<br />
* In the course we use the language of "contextual", "compositional", "between", "within" and "pooled" effects. Identify each from the example on page 28-29 and on the graph Figure 3.4.<br />
<br />
=== Chapter 4 ===<br />
* Describe the relative advantages and disadvantages of REML and ML estimation. When should you choose one over the other?<br />
* Write an R script to simulate appropriate data and fit Models 3 and 4 from page 70.<br />
<br />
<br />
*[[/Model1|Model 1]]<br />
*[[/Model2|Model 2 with Contextual Variable]]<br />
*[[/Model3|Model 3 Centered Within Group and Contextual Variable]]<br />
*[[/Model4|Model 4 Centered Within Group RE]]</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithceMATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce2012-05-29T15:24:27Z<p>Smithce: </p>
<hr />
<div>==About Me==<br />
I am a PhD student in Psychology in the Quantitative Methods Area. I completed my Masters in Psychology degree at York studying depth perception and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background! This year I had the privilege of working as a consultant with the Statistical Consulting Service, which was a fabulous experience. <br />
<br />
I have experience working with R, Matlab and SPSS. I am especially fond of R.<br />
<br />
== '''Discussion Questions''' ==<br />
=== Chapter 2 ===<br />
* I once attended an HLM workshop at a large Education conference. A fellow attendee was skeptical of the entire enterprise. In the example provided the within-group, between-group and pooled effects for socio-economic status were all positively and statistically significantly related with the outcome measure. His opinion was, that since all three effects were in the same direction and significant, why bother with the extra complexity, since it would simply confuse the pants off his trustees anyway. How would you respond?<br />
* '''Post:''' Consider the macro-micro-micro-macro causal chain (pg. 12, Figure 2.7). What would happen if one were to model this simply as a macro-macro (W -> Z) model, omitting the intermediate variables? What if the true relationship was micro-macro-macro-micro? The potential for errors in causal assumptions have always bothered me, especially in SEM type analyses, but it still applies here. In Psychology we are attributing some meaning to a variable, but it could easily be read another way. Here's a convoluted example: The researcher assumes: Teacher Disciplinarianism -> Student Behaviour -> Student Success -> Teacher Stress. But perhaps the pathways are different, and in fact: Student Behaviour -> Teacher Disciplinarianism -> Teacher Stress -> Student Success.<br />
<br />
=== Chapter 3 ===<br />
* In the course we use the language of "contextual", "compositional", "between", "within" and "pooled" effects. Identify each from the example on page 28-29 and on the graph Figure 3.4.<br />
<br />
=== Chapter 4 ===<br />
* Describe the relative advantages and disadvantages of REML and ML estimation. When should you choose one over the other?<br />
* Write an R script to simulate appropriate data and fit Models 3 and 4 from page 70.<br />
<br />
<br />
*[[/Model1|Model 1]]<br />
*[[/Model2|Model 2 with Contextual Variable]]<br />
*[[/Model3|Model 3 Centered Within Group and Contextual Variable]]<br />
*[[/Model3|Model 4 Centered Within Group RE]]</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model3MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model32012-05-29T15:06:00Z<p>Smithce: </p>
<hr />
<div>=== Model 3 Centered Within Group and Contextual Variable ===<br />
<br />
We can easily create a new variable, ses.d, which we assign how far the student deviates in ses from his or her school's mean ses:<br />
dd$ses.d <- with( dd, dvar(ses,id))<br />
<br />
Or equivalently:<br />
dd$ses.d <- dd$ses - dd$ses.m<br />
<br />
<br />
<u> Combined Model </u><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses.d_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}}Sector_jses.d_{ij}</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}(ses_{ij} - ses.m_j) +{\color{Blue}\gamma_{11}}Sector_j(ses_{ij} - ses.m_j)</math><br />
:<math>\underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + ({\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}})ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_j ses_{ij}}_{See \; Model \; 2} - \underbrace{{\color{Blue}\gamma_{11}}Sector_j ses.m_j}_{Add'l \; Interaction}</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + ({\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}} - {\color{Blue}\gamma_{11}}Sector_j)ses.m_j + ({\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}}Sector_j)ses_{ij}</math><br />
<br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fitcd <- lme( mathach ~ ses.d*Sector + ses.m, dd,random = ~ 1 + ses | id )<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23891.45||23947.34||-11936.73<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.756<font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.352<font color=DarkOrange><sup>2</sup></font>||0.555<font color=DarkOrange><sup>3</sup></font><br />
|-<br />
| Residual||6.075<font color=DarkOrange><sup>4</sup></font>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses.d * Sector + ses.m <br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.767<font color=DarkOrange><sup>5</sup></font>||0.330||3602||41.721||0.00E+00<br />
|-<br />
| ses.d||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||0.00E+00<br />
|-<br />
| SectorPublic||-1.825<font color=DarkOrange><sup>7</sup></font>||0.458||77||-3.983||2.00E-04<br />
|-<br />
| ses.m||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||0.00E+00<br />
|-<br />
| ses.d:SectorPublic||1.422<font color=DarkOrange><sup>9</sup></font>||0.316||3602||4.504||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses.d||SctrPb||ses.m<br />
|-<br />
| ses.d||0.125||||||<br />
|-<br />
| SectorPublic||-0.736||-0.089||||<br />
|-<br />
| ses.m||-0.085||0.008||0.247||<br />
|-<br />
| ses.d:SectorPublic||-0.093||-0.744||0.118||-0.002<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.095||-0.733||0.026||0.748||2.829<br />
|}<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,-1,0,1,0),<br />
"Compositional" = c( 0,0,0,1,0)))<br />
wald( fitcd,L )<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||63.84225||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||<.00001||1.024||1.946<br />
|-<br />
| Contextual||3.869<font color=DarkOrange><sup>10</sup></font>||0.613||77||6.308||<.00001||2.648||5.090<br />
|-<br />
| Compositional||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||<.00001||4.222||6.486<br />
|}<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 1.756^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.352^2</math><br />
:3. <math>Cov({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.555</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.075^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.767</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.485</math> Within school effect of ses.d (student deviation from their school mean) for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -1.825</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.767 - 1.825 = 11.942)<br />
:8 <math>{\color{Red}\gamma_{02}} = 5.354</math> Increase in mathach associated with 1 unit increase in school mean ses holding the student's school relative position constant (e.g. increase in mathach from a student 1 unit below his school mean of X compared to a student 1 unit below a school mean of X + 1). This is the between school effect for Catholic schools (e.g. This is the difference going from a student with ses = X in a school with mean ses = Y to a student with ses = X + 1 in a school with mean ses = Y + 1)!<br />
:9 <math>{\color{Blue}\gamma_{11}} = 1.422</math> Change in within-school ses.d for Public schools (e.g. ses.d slope for Public = 1.485 + 1.422 = 2.907)<br />
:10 To compute the contextual effect (taking a student with a constant ses and shifting them to a school with ses.m + 1) we need to take the compositional effect and subtract the within school effect, <math>{\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}} = 5.354 - 1.485 = 3.869</math></div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model3MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model32012-05-29T14:30:04Z<p>Smithce: </p>
<hr />
<div>=== Model 3 Centered Within Group and Contextual Variable ===<br />
<br />
We can easily create a new variable, ses.d, which we assign how far the student deviates in ses from his or her school's mean ses:<br />
dd$ses.d <- with( dd, dvar(ses,id))<br />
<br />
Or equivalently:<br />
dd$ses.d <- dd$ses - dd$ses.m<br />
<br />
<br />
<u> Combined Model </u><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses.d_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}}Sector_jses.d_{ij}</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}(ses_{ij} - ses.m_j) +{\color{Blue}\gamma_{11}}Sector_j(ses_{ij} - ses.m_j)</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} - {\color{Blue}\gamma_{10}}ses.m_j + {\color{Blue}\gamma_{11}}Sector_j ses_{ij} - {\color{Blue}\gamma_{11}}Sector_j ses.m_j</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + ({\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}} - {\color{Blue}\gamma_{11}}Sector_j)ses.m_j + ({\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}}Sector_j)ses_{ij}</math><br />
<br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fitcd <- lme( mathach ~ ses.d*Sector + ses.m, dd,random = ~ 1 + ses | id )<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23891.45||23947.34||-11936.73<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.756<font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.352<font color=DarkOrange><sup>2</sup></font>||0.555<font color=DarkOrange><sup>3</sup></font><br />
|-<br />
| Residual||6.075<font color=DarkOrange><sup>4</sup></font>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses.d * Sector + ses.m <br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.767<font color=DarkOrange><sup>5</sup></font>||0.330||3602||41.721||0.00E+00<br />
|-<br />
| ses.d||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||0.00E+00<br />
|-<br />
| SectorPublic||-1.825<font color=DarkOrange><sup>7</sup></font>||0.458||77||-3.983||2.00E-04<br />
|-<br />
| ses.m||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||0.00E+00<br />
|-<br />
| ses.d:SectorPublic||1.422<font color=DarkOrange><sup>9</sup></font>||0.316||3602||4.504||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses.d||SctrPb||ses.m<br />
|-<br />
| ses.d||0.125||||||<br />
|-<br />
| SectorPublic||-0.736||-0.089||||<br />
|-<br />
| ses.m||-0.085||0.008||0.247||<br />
|-<br />
| ses.d:SectorPublic||-0.093||-0.744||0.118||-0.002<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.095||-0.733||0.026||0.748||2.829<br />
|}<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,-1,0,1,0),<br />
"Compositional" = c( 0,0,0,1,0)))<br />
wald( fitcd,L )<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||63.84225||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||<.00001||1.024||1.946<br />
|-<br />
| Contextual||3.869<font color=DarkOrange><sup>10</sup></font>||0.613||77||6.308||<.00001||2.648||5.090<br />
|-<br />
| Compositional||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||<.00001||4.222||6.486<br />
|}<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 1.756^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.352^2</math><br />
:3. <math>Cov({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.555</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.075^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.767</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.485</math> Within school effect of ses.d (student deviation from their school mean) for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -1.825</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.767 - 1.825 = 11.942)<br />
:8 <math>{\color{Red}\gamma_{02}} = 5.354</math> Increase in mathach associated with 1 unit increase in school mean ses holding the student's school relative position constant (e.g. increase in mathach from a student 1 unit below his school mean of X compared to a student 1 unit below a school mean of X + 1). This is the between school effect for Catholic schools (e.g. This is the difference going from a student with ses = X in a school with mean ses = Y to a student with ses = X + 1 in a school with mean ses = Y + 1)!<br />
:9 <math>{\color{Blue}\gamma_{11}} = 1.422</math> Change in within-school ses.d for Public schools (e.g. ses.d slope for Public = 1.485 + 1.422 = 2.907)<br />
:10 To compute the contextual effect (taking a student with a constant ses and shifting them to a school with ses.m + 1) we need to take the compositional effect and subtract the within school effect, <math>{\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}} = 5.354 - 1.485 = 3.869</math></div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model3MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model32012-05-29T14:24:06Z<p>Smithce: </p>
<hr />
<div>=== Model 3 Centered Within Group and Contextual Variable ===<br />
<br />
We can easily create a new variable, ses.d, which we assign how far the student deviates in ses from his or her school's mean ses:<br />
dd$ses.d <- with( dd, dvar(ses,id))<br />
<br />
Or equivalently:<br />
dd$ses.d <- dd$ses - dd$ses.m<br />
<br />
<br />
<u> Combined Model </u><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses.d_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}}Sector_jses.d_{ij}</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}(ses_{ij} - ses.m_j) +{\color{Blue}\gamma_{11}}Sector_j(ses_{ij} - ses.m_j)</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} - {\color{Blue}\gamma_{10}}ses.m_j + {\color{Blue}\gamma_{11}}Sector_j ses_{ij} - {\color{Blue}\gamma_{11}}Sector_j ses.m_j</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + ({\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}} - {\color{Blue}\gamma_{11}}Sector_j)ses.m_j + ({\color{Blue}\gamma_{10}} + {\color{Blue}\gamma_{11}}Sector_j)ses_{ij}</math><br />
<br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fitcd <- lme( mathach ~ ses.d*Sector + ses.m, dd,random = ~ 1 + ses | id )<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23891.45||23947.34||-11936.73<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.756<font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.352<font color=DarkOrange><sup>2</sup></font>||0.555<font color=DarkOrange><sup>3</sup></font><br />
|-<br />
| Residual||6.075<font color=DarkOrange><sup>4</sup></font>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses.d * Sector + ses.m <br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.767<font color=DarkOrange><sup>5</sup></font>||0.330||3602||41.721||0.00E+00<br />
|-<br />
| ses.d||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||0.00E+00<br />
|-<br />
| SectorPublic||-1.825<font color=DarkOrange><sup>7</sup></font>||0.458||77||-3.983||2.00E-04<br />
|-<br />
| ses.m||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||0.00E+00<br />
|-<br />
| ses.d:SectorPublic||1.422<font color=DarkOrange><sup>9</sup></font>||0.316||3602||4.504||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses.d||SctrPb||ses.m<br />
|-<br />
| ses.d||0.125||||||<br />
|-<br />
| SectorPublic||-0.736||-0.089||||<br />
|-<br />
| ses.m||-0.085||0.008||0.247||<br />
|-<br />
| ses.d:SectorPublic||-0.093||-0.744||0.118||-0.002<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.095||-0.733||0.026||0.748||2.829<br />
|}<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,-1,0,1,0),<br />
"Compositional" = c( 0,0,0,1,0)))<br />
wald( fitcd,L )<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||63.84225||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||<.00001||1.024||1.946<br />
|-<br />
| Contextual||3.869<font color=DarkOrange><sup>10</sup></font>||0.613||77||6.308||<.00001||2.648||5.090<br />
|-<br />
| Compositional||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||<.00001||4.222||6.486<br />
|}<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 1.756^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.352^2</math><br />
:3. <math>Cov({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.555</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.075^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.767</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.485</math> Within school effect of ses.d (student deviation from their school mean) for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -1.825</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.767 - 1.825 = 11.942)<br />
:8 <math>{\color{Red}\gamma_{02}} = 5.354</math> Increase in mathach associated with 1 unit increase in school mean ses holding the student's school relative position constant (e.g. increase in mathach from a student 1 unit below his school mean of X compared to a student 1 unit below a school mean of X + 1). This is the between school effect for Catholic schools (e.g. This is the difference going from a student with ses = X in a school with mean ses = Y to a student with ses = X + 1 in a school with mean ses = Y + 1)!<br />
:9 <math>{\color{Blue}\gamma_{11}} = 1.422</math> Change in within-school ses.d for Public schools (e.g. ses.d slope for Public = 1.485 + 1.422 = 2.907)<br />
:10 To compute the contextual effect (taking a student with a constant ses and shifting them to a school with ses.m + 1) we need to take the compositional effect and subtract the within school effect, <math>{\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}} = 5.354 - 1.485 = 3.869</math></div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model3MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model32012-05-29T13:53:17Z<p>Smithce: </p>
<hr />
<div>=== Model 3 Centered Within Group and Contextual Variable ===<br />
<br />
We can easily create a new variable, ses.d, which we assign how far the student deviates in ses from his or her school's mean ses:<br />
dd$ses.d <- with( dd, dvar(ses,id))<br />
<br />
Or equivalently:<br />
dd$ses.d <- dd$ses - dd$ses.m<br />
<br />
<br />
<u> Combined Model </u><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses.d_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}}Sector_jses.d_{ij}</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}(ses_{ij} - ses.m_j) +{\color{Blue}\gamma_{11}}Sector_j(ses_{ij} - ses.m_j)</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} - {\color{Blue}\gamma_{10}}ses.m_j + {\color{Blue}\gamma_{11}}Sector_j ses_{ij} - {\color{Blue}\gamma_{11}}Sector_j ses.m_j</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + ({\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}})ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_j ses_{ij} - {\color{Blue}\gamma_{11}}Sector_j ses.m_j</math><br />
<br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fitcd <- lme( mathach ~ ses.d*Sector + ses.m, dd,random = ~ 1 + ses | id )<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23891.45||23947.34||-11936.73<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.756<font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.352<font color=DarkOrange><sup>2</sup></font>||0.555<font color=DarkOrange><sup>3</sup></font><br />
|-<br />
| Residual||6.075<font color=DarkOrange><sup>4</sup></font>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses.d * Sector + ses.m <br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.767<font color=DarkOrange><sup>5</sup></font>||0.330||3602||41.721||0.00E+00<br />
|-<br />
| ses.d||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||0.00E+00<br />
|-<br />
| SectorPublic||-1.825<font color=DarkOrange><sup>7</sup></font>||0.458||77||-3.983||2.00E-04<br />
|-<br />
| ses.m||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||0.00E+00<br />
|-<br />
| ses.d:SectorPublic||1.422<font color=DarkOrange><sup>9</sup></font>||0.316||3602||4.504||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses.d||SctrPb||ses.m<br />
|-<br />
| ses.d||0.125||||||<br />
|-<br />
| SectorPublic||-0.736||-0.089||||<br />
|-<br />
| ses.m||-0.085||0.008||0.247||<br />
|-<br />
| ses.d:SectorPublic||-0.093||-0.744||0.118||-0.002<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.095||-0.733||0.026||0.748||2.829<br />
|}<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,0,0,1,0),<br />
"Compositional" = c( 0,1,0,1,0)))<br />
wald ( fitcd , L )<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||63.84225||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||<.00001||1.024||1.946<br />
|-<br />
|||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||<.00001||4.222||6.486<br />
|-<br />
|||6.839<font color=DarkOrange><sup>10</sup></font>||0.617||77||11.086||<.00001||5.611||8.068<br />
|}<br />
<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 1.756^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.352^2</math><br />
:3. <math>Cov({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.555</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.075^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.767</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.485</math> Within school effect of ses.d (student deviation from their school mean) for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -1.825</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.767 - 1.825 = 11.942)<br />
:8 <math>{\color{Red}\gamma_{02}} = 5.354</math> Increase in mathach associated with 1 unit increase in school mean ses holding the student's school relative position constant (e.g. increase in mathach from a student 1 unit below his school mean of X compared to a student 1 unit below a school mean of X + 1)<br />
:9 <math>{\color{Blue}\gamma_{11}} = 1.422</math> Change in within-school ses.d for Public schools (e.g. ses.d slope for Public = 1.485 + 1.422 = 2.907)<br />
:10 <math>{\color{Blue}\gamma_{10}} + {\color{Red}\gamma_{02}} = 1.485 + 5.354 = 6.839 </math> Between school effect for Catholic schools (e.g. This is the difference going from a student with ses = X in a school with mean ses = Y to a student with ses = X + 1 in a school with mean ses = Y + 1)</div>Smithcehttp://scs.math.yorku.ca/index.php/MATH_6643_Summer_2012_Applications_of_Mixed_Models/Students/smithce/Model3MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/Model32012-05-28T22:13:48Z<p>Smithce: </p>
<hr />
<div>=== Model 3 Centered Within Group and Contextual Variable ===<br />
<br />
We can easily create a new variable, ses.d, which we assign how far the student deviates in ses from his or her school's mean ses:<br />
dd$ses.d <- with( dd, dvar(ses,id))<br />
<br />
Or equivalently:<br />
dd$ses.d <- dd$ses - dd$ses.m<br />
<br />
<br />
<u> Combined Model </u><br />
:<math>mathach_{ij} = \underbrace{{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}Sector_j} + {\color{Red}\gamma_{02}ses.m_j} + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}Sector_j}ses.d_{ij}}_{Fixed} + \underbrace{{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}}_{Random} </math><br />
<br />
<u> Fixed Portion of the Model </u><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses.d_{ij} + {\color{Blue}\gamma_{11}}Sector_jses.d_{ij}</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}(ses_{ij} - ses.m_j) +{\color{Blue}\gamma_{11}}Sector_j(ses_{ij} - ses.m_j)</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + {\color{Red}\gamma_{02}}ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} - {\color{Blue}\gamma_{10}}ses.m_j + {\color{Blue}\gamma_{11}}Sector_j ses_{ij} - {\color{Blue}\gamma_{11}}Sector_j ses.m_j</math><br />
:<math>{\color{Red}\gamma_{00}} + {\color{Red}\gamma_{01}}Sector_j + ({\color{Red}\gamma_{02}} - {\color{Blue}\gamma_{10}})ses.m_j + {\color{Blue}\gamma_{10}}ses_{ij} + {\color{Blue}\gamma_{11}}Sector_j ses_{ij} - {\color{Blue}\gamma_{11}}Sector_j ses.m_j</math><br />
<br />
<br />
<u> Random Portion of the Model </u><br />
:<math>{\color{Red}u_{0j}} + {\color{Blue}u_{1j}}ses_{ij} + r_{ij}</math><br />
<br />
<br />
fitcd <- lme( mathach ~ ses.d*Sector + ses.m, dd,random = ~ 1 + ses | id )<br />
# Note that id refers to schools not students!<br />
<br />
<br />
Linear mixed-effects model fit by REML <br><br />
Data: dd <br><br />
{| {{table}}<br />
| AIC||BIC||logLik<br />
|-<br />
| 23891.45||23947.34||-11936.73<br />
|}<br />
<br />
<br />
Random effects:<br><br />
Formula: ~1 + ses | id<br><br />
Structure: General positive-definite, Log-Cholesky parametrization<br><br />
{| {{table}}<br />
| ||StdDev||Corr<br />
|-<br />
| (Intercept)||1.756<font color=DarkOrange><sup>1</sup></font>||(Intr)<br />
|-<br />
| ses||0.352<font color=DarkOrange><sup>2</sup></font>||0.555<font color=DarkOrange><sup>3</sup></font><br />
|-<br />
| Residual||6.075<font color=DarkOrange><sup>4</sup></font>||<br />
|}<br />
<br />
<br />
Fixed effects: mathach ~ ses.d * Sector + ses.m <br />
{| {{table}}<br />
| ||Value||Std.Error||DF||t-value||p-value<br />
|-<br />
| (Intercept)||13.767<font color=DarkOrange><sup>5</sup></font>||0.330||3602||41.721||0.00E+00<br />
|-<br />
| ses.d||1.485<font color=DarkOrange><sup>6</sup></font>||0.235||3602||6.321||0.00E+00<br />
|-<br />
| SectorPublic||-1.825<font color=DarkOrange><sup>7</sup></font>||0.458||77||-3.983||2.00E-04<br />
|-<br />
| ses.m||5.354<font color=DarkOrange><sup>8</sup></font>||0.568||77||9.418||0.00E+00<br />
|-<br />
| ses.d:SectorPublic||1.422<font color=DarkOrange><sup>9</sup></font>||0.316||3602||4.504||0.00E+00<br />
|}<br />
<br />
<br />
Correlation:<br />
{| {{table}}<br />
| ||(Intr)||ses.d||SctrPb||ses.m<br />
|-<br />
| ses.d||0.125||||||<br />
|-<br />
| SectorPublic||-0.736||-0.089||||<br />
|-<br />
| ses.m||-0.085||0.008||0.247||<br />
|-<br />
| ses.d:SectorPublic||-0.093||-0.744||0.118||-0.002<br />
|}<br />
<br />
<br />
Standardized Within-Group Residuals:<br />
{| {{table}}<br />
| Min||Q1||Med||Q3||Max<br />
|-<br />
| -3.095||-0.733||0.026||0.748||2.829<br />
|}<br />
<br />
Number of Observations: 3684<br><br />
Number of Groups: 80<br />
<br />
<br />
L <- list( 'Effect of ses' = rbind(<br />
"Within-school" = c( 0,1,0,0,0),<br />
"Contextual" = c( 0,0,0,1,0),<br />
"Compositional" = c( 0,1,0,1,0)))<br />
wald ( fitcd , L )<br />
<br />
{| {{table}}<br />
| ||numDF||denDF||F.value||p.value<br />
|-<br />
| Effect of ses||2||77||63.84225||<.00001<br />
|}<br />
<br />
{| {{table}}<br />
| ||Estimate||Std.Error||DF||t-value||p-value||Lower 0.95||Upper 0.95<br />
|-<br />
| Within-school||1.485||0.235||3602||6.321||<.00001||1.024||1.946<br />
|-<br />
| Contextual||5.354||0.568||77||9.418||<.00001||4.222||6.486<br />
|-<br />
| Compositional||6.839<font color=DarkOrange><sup>10</sup></font>||0.617||77||11.086||<.00001||5.611||8.068<br />
|}<br />
<br />
<br />
<u><font color=DarkOrange>Notes</font></u><br />
:1. <math>Var({\color{Red}u_{0}}) = 1.756^2</math><br />
:2. <math>Var({\color{Blue}u_{1}}) = 0.352^2</math><br />
:3. <math>Cov({\color{Red}u_{0}},{\color{Blue}u_{1}}) = 0.555</math><br />
:4. <math>Var({\color{Black}r_{ij}}) = 6.075^2</math><br />
:5 <math>{\color{Red}\gamma_{00}} = 13.767</math> Intercept for Catholic schools<br />
:6 <math>{\color{Blue}\gamma_{10}} = 1.485</math> Within school effect of ses.d (student deviation from their school mean) for Catholic schools<br />
:7 <math>{\color{Red}\gamma_{01}} = -1.825</math> Change in Intercept for Public schools (e.g. Intercept for Public = 13.767 - 1.825 = 11.942)<br />
:8 <math>{\color{Red}\gamma_{02}} = 5.354</math> Increase in mathach associated with 1 unit increase in school mean ses holding the student's school relative position constant (e.g. increase in mathach from a student 1 unit below his school mean of X compared to a student 1 unit below a school mean of X + 1)<br />
:9 <math>{\color{Blue}\gamma_{11}} = 1.422</math> Change in within-school ses.d for Public schools (e.g. ses.d slope for Public = 1.485 + 1.422 = 2.907)<br />
:10 <math>{\color{Blue}\gamma_{10}} + {\color{Red}\gamma_{02}} = 1.485 + 5.354 = 6.839 </math> Between school effect for Catholic schools (e.g. This is the difference going from a student with ses = X in a school with mean ses = Y to a student with ses = X + 1 in a school with mean ses = Y + 1)</div>Smithce