http://scs.math.yorku.ca/index.php?title=Special:Contributions/Georges&feed=atom&limit=50&target=Georges&year=&month=Wiki1 - User contributions [en]2019-10-15T14:12:53ZFrom Wiki1MediaWiki 1.16.1http://scs.math.yorku.ca/index.php/Wiki_for_statistical_consultingWiki for statistical consulting2019-10-03T16:08:33Z<p>Georges: /* SCS Administration */</p>
<hr />
<div>This is the wiki for the [http://www.yorku.ca/isr/scs/ Statistical Consulting Service] at York University. You can make [http://www.appointmentquest.com/provider/2000199121 appointments online] to see one of our consultants.<br />
<br />
Anyone can read the content of this wiki. If you are interested in contributing, please let Georges Monette [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki] know that you would like to have an account.<br />
== Hot topics ==<br />
<!--<br />
* Contribute to the list of [[Programs_in_Data_Sciences|programs in Data Sciences]].<br />
--><br />
* [[Sometimes Asked Questions]]<br />
* It would be a great contribution to compile a list of exemplary subject-matter papers reporting statistical methods, especially modern methods such as longitudinal data analysis, etc. <br />
__TOC__<br />
<br />
== Seminars ==<br />
* [[SCS Reads 2019-2020|SCS Reads 2019-2020 ????????]]<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015|SCS Reads 2014-2015 Frank Harrell's Regression Modeling Strategies]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
<br />
== Statistical Seminars in the Toronto Area ==<br />
* [http://qm.info.yorku.ca/ York Quantitative Methods Program in Psychology]<br />
* [http://www.math.yorku.ca/Who/Faculty/Rensburg/Colloquium/Colloquium2013.html York Department of Mathematics and Statistic]<br />
<br />
== Workshops and Courses ==<br />
* [http://blackwell.math.yorku.ca/ICPSR2017 ICPSR 2017 Course in Longitudinal Data Analysis with Mixed and Bayesian Models]<br />
* [[SCS_2017:_Longitudinal_and_Nested_Data|SCS 2017 Models and Analysis for Longitudinal and Nested Data]]<br />
* [[SCS 2014: Visualizing Regression]]<br />
* [[Mixed Models with R]]<br />
* [[SPIDA 2012: Mixed Models with R]]<br />
* [[SCS 2012: Mixed Models with R]]<br />
* [[SCS 2011: Statistical Analysis and Programming with R]]<br />
* [[SCS 2012: A Gentle Introduction to R]]<br />
* [[MATH 6627|MATH 6627 Practicum in Statistical Consulting]]<br />
* [[MATH 6643 Summer 2012 Applications of Mixed Models]]<br />
<br />
== Methods ==<br />
<br />
=== Data Analysis ===<br />
*[[Data Cleaning]]<br />
<br />
* [[Survival Analysis]]<br />
<br />
* [[Latent Variable Models]] (e.g. SEM, CFA, IRT, LCA)<br />
<br />
* [[Multilevel/Mixed Models]]<br />
<br />
* [[General Linear Models]] (e.g. multiple regression, ANOVA)<br />
<br />
* [[Categorical Data Analysis]] (e.g.contingency tables, chi-square, logistic regression)<br />
<br />
* [[Aggregate Data]] (e.g. meta-analysis)<br />
=== Displaying Data and Reporting===<br />
* Good papers on graphs:<br />
*:[http://www.ruf.rice.edu/~lane/papers/designing_better_graphs.pdf Lane, D.M., & Sandor, A. (2009). Designing better graphs by including distributional information and integrating words, numbers, and images. ''Psychological Methods, 14,'' 239-257.]<br />
*:[http://euclid.psych.yorku.ca/www/lab/psy6140/papers/kastellec-using-graphs.pdf Kastellec, J. P. and Leoni, E. L. (2007) Using Graphs Instead of Tables in ''Political Science, Perspectives on Politics'', 5, 755--771]. <br />
*:See also the related web site for Kastellec & Leoni, [http://tables2graphs.com/doku.php Using Graphs Instead of Tables].<br />
<br />
* A [http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist checklist for statistical reporting]<br />
<br />
* [http://www.statlit.org/pdf/2001SchieldBusOfComm.pdf Describing Rates and Percentages in Tables]<br />
<br />
=== Statistical Topics ===<br />
*[[Causality]]<br />
*[[Model Selection]]<br />
*[[Programs in Data Sciences]]<br />
*[[Professional Accreditation]]<br />
*[[Statistics]]<br />
*[[Statistics in the News]]<br />
<br />
=== Statistical Consulting Support ===<br />
* [http://wiki.math.yorku.ca/index.php/MATH_6627_2007-08 York's Statistical Consulting Practicum Wiki 2007-2008]<br />
<br />
* [http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting Statistical Consulting Practicum Wiki 2011]<br />
<br />
* [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/01/rindskopfs_rule.html David Rindskopf's Rules for Consultants]<br />
<br />
* [http://www.rci.rutgers.edu/~cabrera/sc/ Javier Cabrera's Statistical Consulting Course]<br />
<br />
* [http://www.statsci.org/smyth/pubs/training.html Gordon Smyth on Training Students to be Consultants]<br />
<br />
* [http://www.amstat.org/sections/cnsl/ The American Statistical Association's Consulting Section] <br />
<br />
* Janice Derr on the Qualities of an Effective Statistical Consultant [[File: Janice_Derr_on_Consulting.pdf]]<br />
<br />
* [http://www.stat.purdue.edu/scs/help/notes_for_consultants.html Purdue University's Guide for Statistical Consultants]<br />
<br />
* [[Links to other Statistical Consulting Services]]<br />
<br />
== Software ==<br />
* [[R]]<br />
* [[SAS]]<br />
* [[SPSS/PASW]]<br />
* [http://www.gnu.org/software/pspp/ PSPP: open-source package inspired by SPSS]<br />
* [http://www.statmodel.com MPlus]<br />
* How to access software remotely from York's WebFAS system: <br />
** [[Media:Webfas_handout_SAS.pdf|SAS version]]<br />
=== Local R packages ===<br />
The 'spida' and 'p3d' packages are now available through github with<br />
* devtools::install_github('gmonette/spida2') and<br />
* devtools::install_github('gmonette/p3d')<br />
respectively.<br />
<br />
== SCS Administration ==<br />
* Information for SCS TAs<br />
** [[SCS TAships: Allocation of time]]<br />
** Your first term as a TA: <br />
**: [http://scs.math.yorku.ca/images/8/8f/Guidelines_for_new_TAs.pdf Using AppointmentQuest to attend consulting sessions (prepared by Gabriela Gonzalez)]<br />
<!--<br />
* [[SCS Staff Meetings, 2011]]<br />
--><br />
<br />
== Interesting Links ==<br />
=== Statistical consulting services at other universities === <br />
*[http://cscar.research.umich.edu/about/ University of Michigan]<br />
* [[Slides]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
*[[Statistics-specific Job Boards]]<br />
<br />
'''Did you know?'''<br />
<br />
http://www12.statcan.gc.ca/census-recensement/index-eng.cfm</div>Georgeshttp://scs.math.yorku.ca/index.php/Wiki_for_statistical_consultingWiki for statistical consulting2019-10-03T16:07:18Z<p>Georges: /* SCS Administration */</p>
<hr />
<div>This is the wiki for the [http://www.yorku.ca/isr/scs/ Statistical Consulting Service] at York University. You can make [http://www.appointmentquest.com/provider/2000199121 appointments online] to see one of our consultants.<br />
<br />
Anyone can read the content of this wiki. If you are interested in contributing, please let Georges Monette [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki] know that you would like to have an account.<br />
== Hot topics ==<br />
<!--<br />
* Contribute to the list of [[Programs_in_Data_Sciences|programs in Data Sciences]].<br />
--><br />
* [[Sometimes Asked Questions]]<br />
* It would be a great contribution to compile a list of exemplary subject-matter papers reporting statistical methods, especially modern methods such as longitudinal data analysis, etc. <br />
__TOC__<br />
<br />
== Seminars ==<br />
* [[SCS Reads 2019-2020|SCS Reads 2019-2020 ????????]]<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015|SCS Reads 2014-2015 Frank Harrell's Regression Modeling Strategies]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
<br />
== Statistical Seminars in the Toronto Area ==<br />
* [http://qm.info.yorku.ca/ York Quantitative Methods Program in Psychology]<br />
* [http://www.math.yorku.ca/Who/Faculty/Rensburg/Colloquium/Colloquium2013.html York Department of Mathematics and Statistic]<br />
<br />
== Workshops and Courses ==<br />
* [http://blackwell.math.yorku.ca/ICPSR2017 ICPSR 2017 Course in Longitudinal Data Analysis with Mixed and Bayesian Models]<br />
* [[SCS_2017:_Longitudinal_and_Nested_Data|SCS 2017 Models and Analysis for Longitudinal and Nested Data]]<br />
* [[SCS 2014: Visualizing Regression]]<br />
* [[Mixed Models with R]]<br />
* [[SPIDA 2012: Mixed Models with R]]<br />
* [[SCS 2012: Mixed Models with R]]<br />
* [[SCS 2011: Statistical Analysis and Programming with R]]<br />
* [[SCS 2012: A Gentle Introduction to R]]<br />
* [[MATH 6627|MATH 6627 Practicum in Statistical Consulting]]<br />
* [[MATH 6643 Summer 2012 Applications of Mixed Models]]<br />
<br />
== Methods ==<br />
<br />
=== Data Analysis ===<br />
*[[Data Cleaning]]<br />
<br />
* [[Survival Analysis]]<br />
<br />
* [[Latent Variable Models]] (e.g. SEM, CFA, IRT, LCA)<br />
<br />
* [[Multilevel/Mixed Models]]<br />
<br />
* [[General Linear Models]] (e.g. multiple regression, ANOVA)<br />
<br />
* [[Categorical Data Analysis]] (e.g.contingency tables, chi-square, logistic regression)<br />
<br />
* [[Aggregate Data]] (e.g. meta-analysis)<br />
=== Displaying Data and Reporting===<br />
* Good papers on graphs:<br />
*:[http://www.ruf.rice.edu/~lane/papers/designing_better_graphs.pdf Lane, D.M., & Sandor, A. (2009). Designing better graphs by including distributional information and integrating words, numbers, and images. ''Psychological Methods, 14,'' 239-257.]<br />
*:[http://euclid.psych.yorku.ca/www/lab/psy6140/papers/kastellec-using-graphs.pdf Kastellec, J. P. and Leoni, E. L. (2007) Using Graphs Instead of Tables in ''Political Science, Perspectives on Politics'', 5, 755--771]. <br />
*:See also the related web site for Kastellec & Leoni, [http://tables2graphs.com/doku.php Using Graphs Instead of Tables].<br />
<br />
* A [http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist checklist for statistical reporting]<br />
<br />
* [http://www.statlit.org/pdf/2001SchieldBusOfComm.pdf Describing Rates and Percentages in Tables]<br />
<br />
=== Statistical Topics ===<br />
*[[Causality]]<br />
*[[Model Selection]]<br />
*[[Programs in Data Sciences]]<br />
*[[Professional Accreditation]]<br />
*[[Statistics]]<br />
*[[Statistics in the News]]<br />
<br />
=== Statistical Consulting Support ===<br />
* [http://wiki.math.yorku.ca/index.php/MATH_6627_2007-08 York's Statistical Consulting Practicum Wiki 2007-2008]<br />
<br />
* [http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting Statistical Consulting Practicum Wiki 2011]<br />
<br />
* [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/01/rindskopfs_rule.html David Rindskopf's Rules for Consultants]<br />
<br />
* [http://www.rci.rutgers.edu/~cabrera/sc/ Javier Cabrera's Statistical Consulting Course]<br />
<br />
* [http://www.statsci.org/smyth/pubs/training.html Gordon Smyth on Training Students to be Consultants]<br />
<br />
* [http://www.amstat.org/sections/cnsl/ The American Statistical Association's Consulting Section] <br />
<br />
* Janice Derr on the Qualities of an Effective Statistical Consultant [[File: Janice_Derr_on_Consulting.pdf]]<br />
<br />
* [http://www.stat.purdue.edu/scs/help/notes_for_consultants.html Purdue University's Guide for Statistical Consultants]<br />
<br />
* [[Links to other Statistical Consulting Services]]<br />
<br />
== Software ==<br />
* [[R]]<br />
* [[SAS]]<br />
* [[SPSS/PASW]]<br />
* [http://www.gnu.org/software/pspp/ PSPP: open-source package inspired by SPSS]<br />
* [http://www.statmodel.com MPlus]<br />
* How to access software remotely from York's WebFAS system: <br />
** [[Media:Webfas_handout_SAS.pdf|SAS version]]<br />
=== Local R packages ===<br />
The 'spida' and 'p3d' packages are now available through github with<br />
* devtools::install_github('gmonette/spida2') and<br />
* devtools::install_github('gmonette/p3d')<br />
respectively.<br />
<br />
== SCS Administration ==<br />
* Information for SCS TAs<br />
** [[SCS TAships: Allocation of time]]<br />
** Your first term as a TA: [http://scs.math.yorku.ca/images/8/8f/Guidelines_for_new_TAs.pdf Using AppointmentQuest to attend consulting sessions (prepared by Gabriela Gonzalez)]<br />
<!--<br />
* [[SCS Staff Meetings, 2011]]<br />
--><br />
<br />
== Interesting Links ==<br />
=== Statistical consulting services at other universities === <br />
*[http://cscar.research.umich.edu/about/ University of Michigan]<br />
* [[Slides]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
*[[Statistics-specific Job Boards]]<br />
<br />
'''Did you know?'''<br />
<br />
http://www12.statcan.gc.ca/census-recensement/index-eng.cfm</div>Georgeshttp://scs.math.yorku.ca/index.php/File:Guidelines_for_new_TAs.pdfFile:Guidelines for new TAs.pdf2019-10-03T16:06:10Z<p>Georges: </p>
<hr />
<div></div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-26T13:53:54Z<p>Georges: /* Seminar on Friday, September 20, 2019 */</p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== Seminars 2019-20 ==<br />
<br />
== Seminar on Friday, September 20, 2019 ==<br />
At our meeting on September 13, we agreed to read and discuss the first two chapters of ''An Introduction to Statistical Learning'' which can be downloaded from [http://faculty.marshall.usc.edu/gareth-james/ISL/ this link]. We will then decide what to do next.<br />
== Seminar on Friday, October 4, 2019 ==<br />
We will read and discuss Chapter 3 of [http://faculty.marshall.usc.edu/gareth-james/ISL/ ''An Introduction to Statistical Learning''].<br>Data for the exercises are available by installing the ''ISLR'' package in R.<br />
<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette] who can add them for you.'''<br />
<br />
* "Statistical learning": We could read either ''An Introduction to Statistical Learning'' [http://faculty.marshall.usc.edu/gareth-james/ISL/] or ''The Elements of Statistical Learning'' [https://web.stanford.edu/~hastie/ElemStatLearn/]; free .pdfs are available for both.<br />
"The books are based on the concept of “statistical learning,” a mashup of stats and machine learning. The field of machine learning is all about feeding huge amounts of data into algorithms to make accurate predictions. Statistics is concerned with predictions as well, says Tibshirani, but also with determining how confident we can be about the importance of certain inputs. This is important in areas like medicine, where a researcher doesn’t just want to know whether a medicine worked, but also why it worked. Statistical learning is meant to take the best ideas from machine learning and computer science, and explain how they can be used and interpreted through a statistician’s lens." See [https://getpocket.com/explore/item/these-are-the-best-books-for-learning-modern-statistics-and-they-re-all-free]<br />
<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-values and the concept of statistical significance. For an overview, see the [https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 editorial published in the special issue].<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-13T20:30:31Z<p>Georges: /* Seminar on Friday, September 20, 2019 */</p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== Seminars 2019-20 ==<br />
<br />
== Seminar on Friday, September 20, 2019 ==<br />
At our meeting on September 13, we agreed to read and discuss the first two chapters of ''An Introduction to Statistical Learning'' which can be downloaded from [http://faculty.marshall.usc.edu/gareth-james/ISL/ this link]. We will then decide what to do next.<br />
<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette] who can add them for you.'''<br />
<br />
* "Statistical learning": We could read either ''An Introduction to Statistical Learning'' [http://faculty.marshall.usc.edu/gareth-james/ISL/] or ''The Elements of Statistical Learning'' [https://web.stanford.edu/~hastie/ElemStatLearn/]; free .pdfs are available for both.<br />
"The books are based on the concept of “statistical learning,” a mashup of stats and machine learning. The field of machine learning is all about feeding huge amounts of data into algorithms to make accurate predictions. Statistics is concerned with predictions as well, says Tibshirani, but also with determining how confident we can be about the importance of certain inputs. This is important in areas like medicine, where a researcher doesn’t just want to know whether a medicine worked, but also why it worked. Statistical learning is meant to take the best ideas from machine learning and computer science, and explain how they can be used and interpreted through a statistician’s lens." See [https://getpocket.com/explore/item/these-are-the-best-books-for-learning-modern-statistics-and-they-re-all-free]<br />
<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-values and the concept of statistical significance. For an overview, see the [https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 editorial published in the special issue].<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-13T20:29:52Z<p>Georges: /* Seminar on Friday, September 20, 2019 */</p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== Seminars 2019-20 ==<br />
<br />
== Seminar on Friday, September 20, 2019 ==<br />
At our meeting on September 13, we agreed to read and discuss the first two chapters of ''An Introduction to Statistical Learning'' which can be downloaded from [http://faculty.marshall.usc.edu/gareth-james/ISL/ this link]. We plan to then decide what to do next.<br />
<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette] who can add them for you.'''<br />
<br />
* "Statistical learning": We could read either ''An Introduction to Statistical Learning'' [http://faculty.marshall.usc.edu/gareth-james/ISL/] or ''The Elements of Statistical Learning'' [https://web.stanford.edu/~hastie/ElemStatLearn/]; free .pdfs are available for both.<br />
"The books are based on the concept of “statistical learning,” a mashup of stats and machine learning. The field of machine learning is all about feeding huge amounts of data into algorithms to make accurate predictions. Statistics is concerned with predictions as well, says Tibshirani, but also with determining how confident we can be about the importance of certain inputs. This is important in areas like medicine, where a researcher doesn’t just want to know whether a medicine worked, but also why it worked. Statistical learning is meant to take the best ideas from machine learning and computer science, and explain how they can be used and interpreted through a statistician’s lens." See [https://getpocket.com/explore/item/these-are-the-best-books-for-learning-modern-statistics-and-they-re-all-free]<br />
<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-values and the concept of statistical significance. For an overview, see the [https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 editorial published in the special issue].<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-13T20:28:29Z<p>Georges: /* New candidates topics for 2019-2020 */</p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== Seminars 2019-20 ==<br />
<br />
== Seminar on Friday, September 20, 2019 ==<br />
At our meeting on September 13, we agreed to read and discuss the first two chapters of ''An Introduction to Statistical Learning'' [http://faculty.marshall.usc.edu/gareth-james/ISL/]. We plan to then decide what to do next.<br />
<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette] who can add them for you.'''<br />
<br />
* "Statistical learning": We could read either ''An Introduction to Statistical Learning'' [http://faculty.marshall.usc.edu/gareth-james/ISL/] or ''The Elements of Statistical Learning'' [https://web.stanford.edu/~hastie/ElemStatLearn/]; free .pdfs are available for both.<br />
"The books are based on the concept of “statistical learning,” a mashup of stats and machine learning. The field of machine learning is all about feeding huge amounts of data into algorithms to make accurate predictions. Statistics is concerned with predictions as well, says Tibshirani, but also with determining how confident we can be about the importance of certain inputs. This is important in areas like medicine, where a researcher doesn’t just want to know whether a medicine worked, but also why it worked. Statistical learning is meant to take the best ideas from machine learning and computer science, and explain how they can be used and interpreted through a statistician’s lens." See [https://getpocket.com/explore/item/these-are-the-best-books-for-learning-modern-statistics-and-they-re-all-free]<br />
<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-values and the concept of statistical significance. For an overview, see the [https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 editorial published in the special issue].<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-02T17:05:39Z<p>Georges: /* Candidates from 2018-2019 */</p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette] who can add them for you.'''<br />
<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-values and the concept of statistical significance. For an overview, see the [https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 editorial published in the special issue].<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-02T17:02:33Z<p>Georges: /* Candidates from 2018-2019 */</p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette] who can add them for you.'''<br />
<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-values and the concept of statistical significance.<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/Paradoxes,_Fallacies_and_Other_SurprisesParadoxes, Fallacies and Other Surprises2019-09-01T17:44:06Z<p>Georges: /* Base rate paradoxes */</p>
<hr />
<div>An attempt at a taxonomy of statistical paradoxes, fallacies and other surprises.<br />
<br />
== Simpson's Paradox ==<br />
Classical examples:<br />
* '''Florida capital sentencing''' of convicted murderers: Suppression effect of a confounder: No marginal relationship between race of accused and rate of capital sentencing but the relationship becomes very strong when controlling for a confounding factor: the race of victim.<br />
* '''Berkeley graduate admissions''': Overall lower rate of acceptance for female candidates but no or little gender effect within departments. Women tend to apply to departments that are harder to get into (have low admission rates) for both men and women. By controlling for departments the appearance of gender discrimination disappears. But is this the right analysis? That depends on the mechanism through which discrimination occurs. If university budgeting decision systematically favour departments that teach topics that are more appealing to men, then ''departments'' is a mediator and controlling for departments masks the effect of discrimination. We need different models to identify 'micro-discrimination' at the departmental level and 'macro-discrimination' at the university level. This is analogous to asking whether ''department'' should be treated as a confounder (and included in the model) or as a mediator (and excluded) in order to identify a causal effect of gender. Conditioning is not automatically the right thing to do. So the right model to estimate discrimination depends on the mechanism of discrimination. For mechanisms in which department is a mediator or a collider, it must be omitted. For mechanisms in which it is a confounder, it must be included.<br />
<br />
== Birth-Weight Paradox ==<br />
Example of conditioning, or selection, on a collider variable.<br />
<br />
== Regression Paradox ==<br />
Discrepancy between global and local perceptions of relationships. Some classical examples:<br />
* Globally the distribution of heights can remain the same from generation although tall parents get the impression that their children are shorter than themselves and short parents get the impression that their children are taller than themselves on average. Also, tall children get the impression that their parents are shorter than themselves and short children get the impression that their parents are taller than themselves on average. Thus parents get the impression, perfectly legitimately from their point of view, that the distribution of heights is being compressed towards the mean '''and''' children get the impression, also perfectly legitimately, that their parents' heights were more compressed towards the mean.<br />
* Kahneman's pilot instructors got the impression that criticism improved performance of student pilots while praise made it worse. Kahneman thought the causal effect should be in the opposite direction. Regression to the mean allows one to see how Kahneman's belief and the pilot instructors' impression are not inconsistent. The resolution of the paradox lies partly in realizing that the the instructors are noticing an 'observational' relationship that is in the opposite direction to the possible causal relationship. Thus there's a connection with Simpson's Paradox.<br />
<br />
== Lord's Paradox ==<br />
When comparing two groups using a pretest and a posttest, should we compare gain scores between groups or should we regress the posttest on groups using the pretest as a covariate?<br />
<br />
Lord (1967) originally graphed a hypothetical scenario in which males and females were weighed at the beginning and end of an academic year, with gender tested as a predictor of weight gain in response to the cafeteria diet provided at the school. At the start of the school year, females had a lower average weight than males, but the group averages were about the same at the end of the school year. <br />
A difference score model, which uses post-pre as the outcome, concluded that there was no difference in weight gain between males and females, and therefore no systematic influence of gender on changes in weight. <br />
An ANCOVA—that regresses posttest weight on pretest scores as well as the gender predictor—concluded that wherever males and females start the school year with the same initial weight, males are predicted to gain significantly more weight by end of year, leading to the conclusion that gender has a substantial influence on weight gain. <br />
<br />
A more recent example from MLB 2016 data:<br />
Wright (2017) compared the change in batting averages from the first half of Major League Baseball’s 2016 season to the second half, comparing pitchers and position players. ANCOVA that covaries the initial average out concluded that wherever a pitcher and a position player start with the same first half batting average, the position players are predicted to have a higher second half average. <br />
However, the data itself indicates that pitchers actually improve slightly, from .143 in the first half to .166 in the second half, while the position players get slightly worse, from .267 in the first half to .262 in the second half. The gain score approach concludes no difference in the change in batting averages between position players and pitchers. <br />
<br />
== Base rate paradoxes ==<br />
'''Prosecutor's Fallacy''': The p-value to test the hypothesis that Sally Clark was innocent of the murder of her two children could legitimately be interpreted as being in the vicinity of 1/100,000. However there's also a legitimate argument that the probability of her innocence is very close to 1. These two seemingly contradictory results are not inconsistent with each other.<br />
<br />
'''Representativeness heuristic''': This is a concept formulated by Tversky and Kahneman. One could view<br />
the heuristic as amounting to forming judgements based on relative likelihood, thus ignoring the base rate or 'prior', the foundational fallacy of frequentism. <br />
<br />
'''Stereotyping''': [https://www.sciencedirect.com/science/article/pii/0022103182900798 Social stereotypes and judgments of individuals: An instance of the base-rate fallacy]<br />
<br />
== Prisoner's Paradox==<br />
Also known as the Monty Hall problem or the Principle of Restricted Choice in bridge. This is a very revealing paradox that illustrates the importance of taking into account the probabilistic mechanism that generated information, in addition to the information itself, if the information does not induce a partition of the space of possibilities.<br />
<br />
It illustrates the crucial role of statistical modelling, but perhaps not in a way that is supportive of frequentist inference.<br />
<br />
== Weighting Paradoxes ==<br />
Students at a university report an average class size of 100. Professors report an average class size of 50. Are students likely to be exaggerating and professors underestimating the size of their classes?<br />
<br />
== Paradoxes of measures of central tendency ==<br />
A random sample of 100 taxpayers reveals an average income of $30,000 although the government knows that the average income is $60,000. Is the sample likely to be biased and/or respondents understating their income? Or is there another plausible explanation?<br />
<br />
== Inference Paradoxes ==<br />
It is possible to build a model which the parameter space and the support are both equal to the natural numbers and in which a confidence procedure that has at least 2/3 probability of coverage for all <math>\theta</math> (i.e. 2/3 confidence) has only 1/3 posterior probability for all <math>y</math> under a uniform prior for <math>\theta</math>. Thus confidence and credibility can be strongly inconsistent in contrast with the intuition based on compact models in which mean credibility must equal mean confidence.<br />
<br />
<!-- Conditioning paradoxes --></div>Georgeshttp://scs.math.yorku.ca/index.php/Paradoxes,_Fallacies_and_Other_SurprisesParadoxes, Fallacies and Other Surprises2019-09-01T17:43:22Z<p>Georges: /* Regression Paradox */</p>
<hr />
<div>An attempt at a taxonomy of statistical paradoxes, fallacies and other surprises.<br />
<br />
== Simpson's Paradox ==<br />
Classical examples:<br />
* '''Florida capital sentencing''' of convicted murderers: Suppression effect of a confounder: No marginal relationship between race of accused and rate of capital sentencing but the relationship becomes very strong when controlling for a confounding factor: the race of victim.<br />
* '''Berkeley graduate admissions''': Overall lower rate of acceptance for female candidates but no or little gender effect within departments. Women tend to apply to departments that are harder to get into (have low admission rates) for both men and women. By controlling for departments the appearance of gender discrimination disappears. But is this the right analysis? That depends on the mechanism through which discrimination occurs. If university budgeting decision systematically favour departments that teach topics that are more appealing to men, then ''departments'' is a mediator and controlling for departments masks the effect of discrimination. We need different models to identify 'micro-discrimination' at the departmental level and 'macro-discrimination' at the university level. This is analogous to asking whether ''department'' should be treated as a confounder (and included in the model) or as a mediator (and excluded) in order to identify a causal effect of gender. Conditioning is not automatically the right thing to do. So the right model to estimate discrimination depends on the mechanism of discrimination. For mechanisms in which department is a mediator or a collider, it must be omitted. For mechanisms in which it is a confounder, it must be included.<br />
<br />
== Birth-Weight Paradox ==<br />
Example of conditioning, or selection, on a collider variable.<br />
<br />
== Regression Paradox ==<br />
Discrepancy between global and local perceptions of relationships. Some classical examples:<br />
* Globally the distribution of heights can remain the same from generation although tall parents get the impression that their children are shorter than themselves and short parents get the impression that their children are taller than themselves on average. Also, tall children get the impression that their parents are shorter than themselves and short children get the impression that their parents are taller than themselves on average. Thus parents get the impression, perfectly legitimately from their point of view, that the distribution of heights is being compressed towards the mean '''and''' children get the impression, also perfectly legitimately, that their parents' heights were more compressed towards the mean.<br />
* Kahneman's pilot instructors got the impression that criticism improved performance of student pilots while praise made it worse. Kahneman thought the causal effect should be in the opposite direction. Regression to the mean allows one to see how Kahneman's belief and the pilot instructors' impression are not inconsistent. The resolution of the paradox lies partly in realizing that the the instructors are noticing an 'observational' relationship that is in the opposite direction to the possible causal relationship. Thus there's a connection with Simpson's Paradox.<br />
<br />
== Lord's Paradox ==<br />
When comparing two groups using a pretest and a posttest, should we compare gain scores between groups or should we regress the posttest on groups using the pretest as a covariate?<br />
<br />
Lord (1967) originally graphed a hypothetical scenario in which males and females were weighed at the beginning and end of an academic year, with gender tested as a predictor of weight gain in response to the cafeteria diet provided at the school. At the start of the school year, females had a lower average weight than males, but the group averages were about the same at the end of the school year. <br />
A difference score model, which uses post-pre as the outcome, concluded that there was no difference in weight gain between males and females, and therefore no systematic influence of gender on changes in weight. <br />
An ANCOVA—that regresses posttest weight on pretest scores as well as the gender predictor—concluded that wherever males and females start the school year with the same initial weight, males are predicted to gain significantly more weight by end of year, leading to the conclusion that gender has a substantial influence on weight gain. <br />
<br />
A more recent example from MLB 2016 data:<br />
Wright (2017) compared the change in batting averages from the first half of Major League Baseball’s 2016 season to the second half, comparing pitchers and position players. ANCOVA that covaries the initial average out concluded that wherever a pitcher and a position player start with the same first half batting average, the position players are predicted to have a higher second half average. <br />
However, the data itself indicates that pitchers actually improve slightly, from .143 in the first half to .166 in the second half, while the position players get slightly worse, from .267 in the first half to .262 in the second half. The gain score approach concludes no difference in the change in batting averages between position players and pitchers. <br />
<br />
== Base rate paradoxes ==<br />
'''Prosecutor's Fallacy''': The p-value to test the hypothesis that Sally Clark was innocent of the murder of her two children could legitimately be interpreted as being in the vicinity of 1/100,000. However there's also a legitimate argument that the probability of her innocence is very close to 1. These two results seemingly contradictory results are not inconsistent with each other.<br />
<br />
'''Representativeness heuristic''': This is a concept formulated by Tversky and Kahneman. One could view<br />
the heuristic as amounting to forming judgements based on relative likelihood, thus ignoring the base rate or 'prior', the foundational fallacy of frequentism. <br />
<br />
'''Stereotyping''': [https://www.sciencedirect.com/science/article/pii/0022103182900798 Social stereotypes and judgments of individuals: An instance of the base-rate fallacy]<br />
<br />
== Prisoner's Paradox==<br />
Also known as the Monty Hall problem or the Principle of Restricted Choice in bridge. This is a very revealing paradox that illustrates the importance of taking into account the probabilistic mechanism that generated information, in addition to the information itself, if the information does not induce a partition of the space of possibilities.<br />
<br />
It illustrates the crucial role of statistical modelling, but perhaps not in a way that is supportive of frequentist inference.<br />
<br />
== Weighting Paradoxes ==<br />
Students at a university report an average class size of 100. Professors report an average class size of 50. Are students likely to be exaggerating and professors underestimating the size of their classes?<br />
<br />
== Paradoxes of measures of central tendency ==<br />
A random sample of 100 taxpayers reveals an average income of $30,000 although the government knows that the average income is $60,000. Is the sample likely to be biased and/or respondents understating their income? Or is there another plausible explanation?<br />
<br />
== Inference Paradoxes ==<br />
It is possible to build a model which the parameter space and the support are both equal to the natural numbers and in which a confidence procedure that has at least 2/3 probability of coverage for all <math>\theta</math> (i.e. 2/3 confidence) has only 1/3 posterior probability for all <math>y</math> under a uniform prior for <math>\theta</math>. Thus confidence and credibility can be strongly inconsistent in contrast with the intuition based on compact models in which mean credibility must equal mean confidence.<br />
<br />
<!-- Conditioning paradoxes --></div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-01T17:40:56Z<p>Georges: /* New candidates topics for 2019-2020 */</p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette] who can add them for you.'''<br />
<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance.<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-01T17:40:36Z<p>Georges: /* New candidates topics for 2019-2020 */</p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].'''<br />
<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance.<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-01T17:40:07Z<p>Georges: </p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add other suggestions here or send them to [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki Georges Monette].'''<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance.<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-01T17:37:25Z<p>Georges: </p>
<hr />
<div>* A topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* To get an account to edit this wiki send a message to georges@yorku.ca.<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add other suggestions here or send them to georges@yorku.ca.'''<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance.<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-01T17:36:39Z<p>Georges: </p>
<hr />
<div>* Note that a topic need not occupy the entire academic year and we could plan to consider more than one topic.<br />
* Add other suggestions here or send them to georges@yorku.ca.<br />
* To get an account to edit this wiki send a message to georges@yorku.ca.<br />
== New candidates topics for 2019-2020 ==<br />
* '''Add other suggestions here or send them to georges@yorku.ca.'''<br />
== Candidates from 2018-2019==<br />
* '''Reproducibility of research: a crisis in Statistics??''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. <br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
** We can use articles in the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance.<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
* '''Evidence-based medicine''': Ideas and implications<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
* '''Missing Data'''<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-01T17:25:17Z<p>Georges: /* New candidates for 2019-2020 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== New candidates topics for 2019-2020 ==<br />
* '''The Crisis in Statistics??''' Use the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance, as a springboard to select articles from the special issue or elsewhere to discuss whether there is a crisis in statistics and what should we do.<br />
* Add other suggestions here or send them to georges@yorku.ca.<br />
<br />
== Candidates from 2018-2019==<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focusing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].<br />
<br />
== February 1 ==<br />
<br />
* We continue with Chapter 8 of ''The Book of Why'', '''Counterfactuals: Mining Worlds That Could Have Been'''<br />
* MF offered some examples of papers describing R packages for learning/doing causal modeling<br />
** [http://dagity.net/primer Causal Inference in Statistics: A Companion for R Users]. This is designed to accompany the book [http://bayes.cs.ucla.edu/PRIMER/ Causal Inference in Statistics: A Primer] by Pearl, Glymour and Jewell.<br />
** [https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html Vignette for the CausalImpact package]. This package implements a Bayesian approach to causal impact estimation in time series data. It is designed for a specific situation, when you have time series pre- and post- some intervention. I mention it here only because it produces nice graphs of <br />
** [https://www.jstatsoft.org/article/view/v082i02 A Recipe for Interference: Start with Causal Inference. Add Interference. Mix Well with R.] by Bradley Saul & Michael Hudgens</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-01T17:24:51Z<p>Georges: /* Candidates for 2019-2020 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== New candidates for 2019-2020 ==<br />
* '''The Crisis in Statistics??''' Use the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance, as a springboard to select articles from the special issue or elsewhere to discuss whether there is a crisis in statistics and what should we do.<br />
* Add other suggestions here or send them to georges@yorku.ca.<br />
== Candidates from 2018-2019==<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focusing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].<br />
<br />
== February 1 ==<br />
<br />
* We continue with Chapter 8 of ''The Book of Why'', '''Counterfactuals: Mining Worlds That Could Have Been'''<br />
* MF offered some examples of papers describing R packages for learning/doing causal modeling<br />
** [http://dagity.net/primer Causal Inference in Statistics: A Companion for R Users]. This is designed to accompany the book [http://bayes.cs.ucla.edu/PRIMER/ Causal Inference in Statistics: A Primer] by Pearl, Glymour and Jewell.<br />
** [https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html Vignette for the CausalImpact package]. This package implements a Bayesian approach to causal impact estimation in time series data. It is designed for a specific situation, when you have time series pre- and post- some intervention. I mention it here only because it produces nice graphs of <br />
** [https://www.jstatsoft.org/article/view/v082i02 A Recipe for Interference: Start with Causal Inference. Add Interference. Mix Well with R.] by Bradley Saul & Michael Hudgens</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-09-01T17:22:54Z<p>Georges: /* Candidates for 2019-2020 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Candidates for 2019-2020 ==<br />
* **The Crisis in Statistics??** Use the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance, as a springboard to select articles from the special issue or elsewhere to discuss whether there is a crisis in statistics and what should we do.<br />
* Add other suggestions here or send them to georges@yorku.ca.<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focusing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].<br />
<br />
== February 1 ==<br />
<br />
* We continue with Chapter 8 of ''The Book of Why'', '''Counterfactuals: Mining Worlds That Could Have Been'''<br />
* MF offered some examples of papers describing R packages for learning/doing causal modeling<br />
** [http://dagity.net/primer Causal Inference in Statistics: A Companion for R Users]. This is designed to accompany the book [http://bayes.cs.ucla.edu/PRIMER/ Causal Inference in Statistics: A Primer] by Pearl, Glymour and Jewell.<br />
** [https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html Vignette for the CausalImpact package]. This package implements a Bayesian approach to causal impact estimation in time series data. It is designed for a specific situation, when you have time series pre- and post- some intervention. I mention it here only because it produces nice graphs of <br />
** [https://www.jstatsoft.org/article/view/v082i02 A Recipe for Interference: Start with Causal Inference. Add Interference. Mix Well with R.] by Bradley Saul & Michael Hudgens</div>Georgeshttp://scs.math.yorku.ca/index.php/Wiki_for_statistical_consultingWiki for statistical consulting2019-08-30T17:13:09Z<p>Georges: </p>
<hr />
<div>This is the wiki for the [http://www.yorku.ca/isr/scs/ Statistical Consulting Service] at York University. You can make [http://www.appointmentquest.com/provider/2000199121 appointments online] to see one of our consultants.<br />
<br />
Anyone can read the content of this wiki. If you are interested in contributing, please let Georges Monette [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki] know that you would like to have an account.<br />
== Hot topics ==<br />
<!--<br />
* Contribute to the list of [[Programs_in_Data_Sciences|programs in Data Sciences]].<br />
--><br />
* [[Sometimes Asked Questions]]<br />
* It would be a great contribution to compile a list of exemplary subject-matter papers reporting statistical methods, especially modern methods such as longitudinal data analysis, etc. <br />
__TOC__<br />
<br />
== Seminars ==<br />
* [[SCS Reads 2019-2020|SCS Reads 2019-2020 ????????]]<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015|SCS Reads 2014-2015 Frank Harrell's Regression Modeling Strategies]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
<br />
== Statistical Seminars in the Toronto Area ==<br />
* [http://qm.info.yorku.ca/ York Quantitative Methods Program in Psychology]<br />
* [http://www.math.yorku.ca/Who/Faculty/Rensburg/Colloquium/Colloquium2013.html York Department of Mathematics and Statistic]<br />
<br />
== Workshops and Courses ==<br />
* [http://blackwell.math.yorku.ca/ICPSR2017 ICPSR 2017 Course in Longitudinal Data Analysis with Mixed and Bayesian Models]<br />
* [[SCS_2017:_Longitudinal_and_Nested_Data|SCS 2017 Models and Analysis for Longitudinal and Nested Data]]<br />
* [[SCS 2014: Visualizing Regression]]<br />
* [[Mixed Models with R]]<br />
* [[SPIDA 2012: Mixed Models with R]]<br />
* [[SCS 2012: Mixed Models with R]]<br />
* [[SCS 2011: Statistical Analysis and Programming with R]]<br />
* [[SCS 2012: A Gentle Introduction to R]]<br />
* [[MATH 6627|MATH 6627 Practicum in Statistical Consulting]]<br />
* [[MATH 6643 Summer 2012 Applications of Mixed Models]]<br />
<br />
== Methods ==<br />
<br />
=== Data Analysis ===<br />
*[[Data Cleaning]]<br />
<br />
* [[Survival Analysis]]<br />
<br />
* [[Latent Variable Models]] (e.g. SEM, CFA, IRT, LCA)<br />
<br />
* [[Multilevel/Mixed Models]]<br />
<br />
* [[General Linear Models]] (e.g. multiple regression, ANOVA)<br />
<br />
* [[Categorical Data Analysis]] (e.g.contingency tables, chi-square, logistic regression)<br />
<br />
* [[Aggregate Data]] (e.g. meta-analysis)<br />
=== Displaying Data and Reporting===<br />
* Good papers on graphs:<br />
*:[http://www.ruf.rice.edu/~lane/papers/designing_better_graphs.pdf Lane, D.M., & Sandor, A. (2009). Designing better graphs by including distributional information and integrating words, numbers, and images. ''Psychological Methods, 14,'' 239-257.]<br />
*:[http://euclid.psych.yorku.ca/www/lab/psy6140/papers/kastellec-using-graphs.pdf Kastellec, J. P. and Leoni, E. L. (2007) Using Graphs Instead of Tables in ''Political Science, Perspectives on Politics'', 5, 755--771]. <br />
*:See also the related web site for Kastellec & Leoni, [http://tables2graphs.com/doku.php Using Graphs Instead of Tables].<br />
<br />
* A [http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist checklist for statistical reporting]<br />
<br />
* [http://www.statlit.org/pdf/2001SchieldBusOfComm.pdf Describing Rates and Percentages in Tables]<br />
<br />
=== Statistical Topics ===<br />
*[[Causality]]<br />
*[[Model Selection]]<br />
*[[Programs in Data Sciences]]<br />
*[[Professional Accreditation]]<br />
*[[Statistics]]<br />
*[[Statistics in the News]]<br />
<br />
=== Statistical Consulting Support ===<br />
* [http://wiki.math.yorku.ca/index.php/MATH_6627_2007-08 York's Statistical Consulting Practicum Wiki 2007-2008]<br />
<br />
* [http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting Statistical Consulting Practicum Wiki 2011]<br />
<br />
* [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/01/rindskopfs_rule.html David Rindskopf's Rules for Consultants]<br />
<br />
* [http://www.rci.rutgers.edu/~cabrera/sc/ Javier Cabrera's Statistical Consulting Course]<br />
<br />
* [http://www.statsci.org/smyth/pubs/training.html Gordon Smyth on Training Students to be Consultants]<br />
<br />
* [http://www.amstat.org/sections/cnsl/ The American Statistical Association's Consulting Section] <br />
<br />
* Janice Derr on the Qualities of an Effective Statistical Consultant [[File: Janice_Derr_on_Consulting.pdf]]<br />
<br />
* [http://www.stat.purdue.edu/scs/help/notes_for_consultants.html Purdue University's Guide for Statistical Consultants]<br />
<br />
* [[Links to other Statistical Consulting Services]]<br />
<br />
== Software ==<br />
* [[R]]<br />
* [[SAS]]<br />
* [[SPSS/PASW]]<br />
* [http://www.gnu.org/software/pspp/ PSPP: open-source package inspired by SPSS]<br />
* [http://www.statmodel.com MPlus]<br />
* How to access software remotely from York's WebFAS system: <br />
** [[Media:Webfas_handout_SAS.pdf|SAS version]]<br />
=== Local R packages ===<br />
The 'spida' and 'p3d' packages are now available through github with<br />
* devtools::install_github('gmonette/spida2') and<br />
* devtools::install_github('gmonette/p3d')<br />
respectively.<br />
<br />
== SCS Administration ==<br />
* [[SCS TAships: Allocation of time]]<br />
* [[SCS Staff Meetings, 2011]]<br />
<br />
== Interesting Links ==<br />
=== Statistical consulting services at other universities === <br />
*[http://cscar.research.umich.edu/about/ University of Michigan]<br />
* [[Slides]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
*[[Statistics-specific Job Boards]]<br />
<br />
'''Did you know?'''<br />
<br />
http://www12.statcan.gc.ca/census-recensement/index-eng.cfm</div>Georgeshttp://scs.math.yorku.ca/index.php/Wiki_for_statistical_consultingWiki for statistical consulting2019-08-30T17:11:12Z<p>Georges: </p>
<hr />
<div>This is the wiki for the [http://www.yorku.ca/isr/scs/ Statistical Consulting Service] at York University. You can make [http://www.appointmentquest.com/provider/2000199121 appointments online] to see one of our consultants.<br />
<br />
Change<br />
<br />
Suggestion: to discuss the structure of the wiki, click on the discussion tab above.<br />
<br />
If you would like to have an account on this wiki, please contact Georges Monette [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki].<br />
<!--<br />
== Hot topics ==<br />
* Contribute to the list of [[Programs_in_Data_Sciences|programs in Data Sciences]].<br />
--><br />
* [[Sometimes Asked Questions]]<br />
* It would be a great contribution to compile a list of exemplary subject-matter papers reporting statistical methods, especially modern methods such as longitudinal data analysis, etc. <br />
__TOC__<br />
<br />
== Seminars ==<br />
* [[SCS Reads 2019-2020|SCS Reads 2019-2020 ????????]]<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015|SCS Reads 2014-2015 Frank Harrell's Regression Modeling Strategies]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
<br />
== Statistical Seminars in the Toronto Area ==<br />
* [http://qm.info.yorku.ca/ York Quantitative Methods Program in Psychology]<br />
* [http://www.math.yorku.ca/Who/Faculty/Rensburg/Colloquium/Colloquium2013.html York Department of Mathematics and Statistic]<br />
<br />
== Workshops and Courses ==<br />
* [http://blackwell.math.yorku.ca/ICPSR2017 ICPSR 2017 Course in Longitudinal Data Analysis with Mixed and Bayesian Models]<br />
* [[SCS_2017:_Longitudinal_and_Nested_Data|SCS 2017 Models and Analysis for Longitudinal and Nested Data]]<br />
* [[SCS 2014: Visualizing Regression]]<br />
* [[Mixed Models with R]]<br />
* [[SPIDA 2012: Mixed Models with R]]<br />
* [[SCS 2012: Mixed Models with R]]<br />
* [[SCS 2011: Statistical Analysis and Programming with R]]<br />
* [[SCS 2012: A Gentle Introduction to R]]<br />
* [[MATH 6627|MATH 6627 Practicum in Statistical Consulting]]<br />
* [[MATH 6643 Summer 2012 Applications of Mixed Models]]<br />
<br />
== Methods ==<br />
<br />
=== Data Analysis ===<br />
*[[Data Cleaning]]<br />
<br />
* [[Survival Analysis]]<br />
<br />
* [[Latent Variable Models]] (e.g. SEM, CFA, IRT, LCA)<br />
<br />
* [[Multilevel/Mixed Models]]<br />
<br />
* [[General Linear Models]] (e.g. multiple regression, ANOVA)<br />
<br />
* [[Categorical Data Analysis]] (e.g.contingency tables, chi-square, logistic regression)<br />
<br />
* [[Aggregate Data]] (e.g. meta-analysis)<br />
=== Displaying Data and Reporting===<br />
* Good papers on graphs:<br />
*:[http://www.ruf.rice.edu/~lane/papers/designing_better_graphs.pdf Lane, D.M., & Sandor, A. (2009). Designing better graphs by including distributional information and integrating words, numbers, and images. ''Psychological Methods, 14,'' 239-257.]<br />
*:[http://euclid.psych.yorku.ca/www/lab/psy6140/papers/kastellec-using-graphs.pdf Kastellec, J. P. and Leoni, E. L. (2007) Using Graphs Instead of Tables in ''Political Science, Perspectives on Politics'', 5, 755--771]. <br />
*:See also the related web site for Kastellec & Leoni, [http://tables2graphs.com/doku.php Using Graphs Instead of Tables].<br />
<br />
* A [http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist checklist for statistical reporting]<br />
<br />
* [http://www.statlit.org/pdf/2001SchieldBusOfComm.pdf Describing Rates and Percentages in Tables]<br />
<br />
=== Statistical Topics ===<br />
*[[Causality]]<br />
*[[Model Selection]]<br />
*[[Programs in Data Sciences]]<br />
*[[Professional Accreditation]]<br />
*[[Statistics]]<br />
*[[Statistics in the News]]<br />
<br />
=== Statistical Consulting Support ===<br />
* [http://wiki.math.yorku.ca/index.php/MATH_6627_2007-08 York's Statistical Consulting Practicum Wiki 2007-2008]<br />
<br />
* [http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting Statistical Consulting Practicum Wiki 2011]<br />
<br />
* [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/01/rindskopfs_rule.html David Rindskopf's Rules for Consultants]<br />
<br />
* [http://www.rci.rutgers.edu/~cabrera/sc/ Javier Cabrera's Statistical Consulting Course]<br />
<br />
* [http://www.statsci.org/smyth/pubs/training.html Gordon Smyth on Training Students to be Consultants]<br />
<br />
* [http://www.amstat.org/sections/cnsl/ The American Statistical Association's Consulting Section] <br />
<br />
* Janice Derr on the Qualities of an Effective Statistical Consultant [[File: Janice_Derr_on_Consulting.pdf]]<br />
<br />
* [http://www.stat.purdue.edu/scs/help/notes_for_consultants.html Purdue University's Guide for Statistical Consultants]<br />
<br />
* [[Links to other Statistical Consulting Services]]<br />
<br />
== Software ==<br />
* [[R]]<br />
* [[SAS]]<br />
* [[SPSS/PASW]]<br />
* [http://www.gnu.org/software/pspp/ PSPP: open-source package inspired by SPSS]<br />
* [http://www.statmodel.com MPlus]<br />
* How to access software remotely from York's WebFAS system: <br />
** [[Media:Webfas_handout_SAS.pdf|SAS version]]<br />
=== Local R packages ===<br />
The 'spida' and 'p3d' packages are now available through github with<br />
* devtools::install_github('gmonette/spida2') and<br />
* devtools::install_github('gmonette/p3d')<br />
respectively.<br />
<br />
== SCS Administration ==<br />
* [[SCS TAships: Allocation of time]]<br />
* [[SCS Staff Meetings, 2011]]<br />
<br />
== Interesting Links ==<br />
=== Statistical consulting services at other universities === <br />
*[http://cscar.research.umich.edu/about/ University of Michigan]<br />
* [[Slides]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
*[[Statistics-specific Job Boards]]<br />
<br />
'''Did you know?'''<br />
<br />
http://www12.statcan.gc.ca/census-recensement/index-eng.cfm</div>Georgeshttp://scs.math.yorku.ca/index.php/MediaWiki:SidebarMediaWiki:Sidebar2019-08-30T17:09:04Z<p>Georges: </p>
<hr />
<div>* navigation<br />
** mainpage|mainpage-description<br />
** http://www.appointmentquest.com/scheduler/2000199121|SCS Appointments<br />
** http://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020|SCS Reads 2019-20<br />
** recentchanges-url|recentchanges<br />
** randompage-url|randompage<br />
** https://www.mediawiki.org/wiki/Help:Contents|help<br />
* SEARCH<br />
* TOOLBOX<br />
* LANGUAGES</div>Georgeshttp://scs.math.yorku.ca/index.php/Wiki_for_statistical_consultingWiki for statistical consulting2019-08-30T17:03:25Z<p>Georges: /* Seminars */</p>
<hr />
<div>This is the wiki for the [http://www.yorku.ca/isr/scs/ Statistical Consulting Service] at York University. You can make [http://www.appointmentquest.com/provider/2000199121 appointments online] to see one of our consultants.<br />
<br />
Change<br />
<br />
Suggestion: to discuss the structure of the wiki, click on the discussion tab above.<br />
<br />
If you would like to have an account on this wiki, please contact Georges Monette [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki].<br />
== Hot topics ==<br />
* Contribute to the list of [[Programs_in_Data_Sciences|programs in Data Sciences]].<br />
* [[Sometimes Asked Questions]]<br />
* It would be a great contribution to compile a list of exemplary subject-matter papers reporting statistical methods, especially modern methods such as longitudinal data analysis, etc. <br />
__TOC__<br />
<br />
== Seminars ==<br />
* [[SCS Reads 2019-2020|SCS Reads 2019-2020 ????????]]<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015|SCS Reads 2014-2015 Frank Harrell's Regression Modeling Strategies]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
<br />
== Statistical Seminars in the Toronto Area ==<br />
* [http://qm.info.yorku.ca/ York Quantitative Methods Program in Psychology]<br />
* [http://www.math.yorku.ca/Who/Faculty/Rensburg/Colloquium/Colloquium2013.html York Department of Mathematics and Statistic]<br />
<br />
== Workshops and Courses ==<br />
* [http://blackwell.math.yorku.ca/ICPSR2017 ICPSR 2017 Course in Longitudinal Data Analysis with Mixed and Bayesian Models]<br />
* [[SCS_2017:_Longitudinal_and_Nested_Data|SCS 2017 Models and Analysis for Longitudinal and Nested Data]]<br />
* [[SCS 2014: Visualizing Regression]]<br />
* [[Mixed Models with R]]<br />
* [[SPIDA 2012: Mixed Models with R]]<br />
* [[SCS 2012: Mixed Models with R]]<br />
* [[SCS 2011: Statistical Analysis and Programming with R]]<br />
* [[SCS 2012: A Gentle Introduction to R]]<br />
* [[MATH 6627|MATH 6627 Practicum in Statistical Consulting]]<br />
* [[MATH 6643 Summer 2012 Applications of Mixed Models]]<br />
<br />
== Methods ==<br />
<br />
=== Data Analysis ===<br />
*[[Data Cleaning]]<br />
<br />
* [[Survival Analysis]]<br />
<br />
* [[Latent Variable Models]] (e.g. SEM, CFA, IRT, LCA)<br />
<br />
* [[Multilevel/Mixed Models]]<br />
<br />
* [[General Linear Models]] (e.g. multiple regression, ANOVA)<br />
<br />
* [[Categorical Data Analysis]] (e.g.contingency tables, chi-square, logistic regression)<br />
<br />
* [[Aggregate Data]] (e.g. meta-analysis)<br />
=== Displaying Data and Reporting===<br />
* Good papers on graphs:<br />
*:[http://www.ruf.rice.edu/~lane/papers/designing_better_graphs.pdf Lane, D.M., & Sandor, A. (2009). Designing better graphs by including distributional information and integrating words, numbers, and images. ''Psychological Methods, 14,'' 239-257.]<br />
*:[http://euclid.psych.yorku.ca/www/lab/psy6140/papers/kastellec-using-graphs.pdf Kastellec, J. P. and Leoni, E. L. (2007) Using Graphs Instead of Tables in ''Political Science, Perspectives on Politics'', 5, 755--771]. <br />
*:See also the related web site for Kastellec & Leoni, [http://tables2graphs.com/doku.php Using Graphs Instead of Tables].<br />
<br />
* A [http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist checklist for statistical reporting]<br />
<br />
* [http://www.statlit.org/pdf/2001SchieldBusOfComm.pdf Describing Rates and Percentages in Tables]<br />
<br />
=== Statistical Topics ===<br />
*[[Causality]]<br />
*[[Model Selection]]<br />
*[[Programs in Data Sciences]]<br />
*[[Professional Accreditation]]<br />
*[[Statistics]]<br />
*[[Statistics in the News]]<br />
<br />
=== Statistical Consulting Support ===<br />
* [http://wiki.math.yorku.ca/index.php/MATH_6627_2007-08 York's Statistical Consulting Practicum Wiki 2007-2008]<br />
<br />
* [http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting Statistical Consulting Practicum Wiki 2011]<br />
<br />
* [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/01/rindskopfs_rule.html David Rindskopf's Rules for Consultants]<br />
<br />
* [http://www.rci.rutgers.edu/~cabrera/sc/ Javier Cabrera's Statistical Consulting Course]<br />
<br />
* [http://www.statsci.org/smyth/pubs/training.html Gordon Smyth on Training Students to be Consultants]<br />
<br />
* [http://www.amstat.org/sections/cnsl/ The American Statistical Association's Consulting Section] <br />
<br />
* Janice Derr on the Qualities of an Effective Statistical Consultant [[File: Janice_Derr_on_Consulting.pdf]]<br />
<br />
* [http://www.stat.purdue.edu/scs/help/notes_for_consultants.html Purdue University's Guide for Statistical Consultants]<br />
<br />
* [[Links to other Statistical Consulting Services]]<br />
<br />
== Software ==<br />
* [[R]]<br />
* [[SAS]]<br />
* [[SPSS/PASW]]<br />
* [http://www.gnu.org/software/pspp/ PSPP: open-source package inspired by SPSS]<br />
* [http://www.statmodel.com MPlus]<br />
* How to access software remotely from York's WebFAS system: <br />
** [[Media:Webfas_handout_SAS.pdf|SAS version]]<br />
=== Local R packages ===<br />
The 'spida' and 'p3d' packages are now available through github with<br />
* devtools::install_github('gmonette/spida2') and<br />
* devtools::install_github('gmonette/p3d')<br />
respectively.<br />
<br />
== SCS Administration ==<br />
* [[SCS TAships: Allocation of time]]<br />
* [[SCS Staff Meetings, 2011]]<br />
<br />
== Interesting Links ==<br />
=== Statistical consulting services at other universities === <br />
*[http://cscar.research.umich.edu/about/ University of Michigan]<br />
* [[Slides]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
*[[Statistics-specific Job Boards]]<br />
<br />
'''Did you know?'''<br />
<br />
http://www12.statcan.gc.ca/census-recensement/index-eng.cfm</div>Georgeshttp://scs.math.yorku.ca/index.php/Wiki_for_statistical_consultingWiki for statistical consulting2019-08-30T17:01:05Z<p>Georges: /* Seminars */</p>
<hr />
<div>This is the wiki for the [http://www.yorku.ca/isr/scs/ Statistical Consulting Service] at York University. You can make [http://www.appointmentquest.com/provider/2000199121 appointments online] to see one of our consultants.<br />
<br />
Change<br />
<br />
Suggestion: to discuss the structure of the wiki, click on the discussion tab above.<br />
<br />
If you would like to have an account on this wiki, please contact Georges Monette [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki].<br />
== Hot topics ==<br />
* Contribute to the list of [[Programs_in_Data_Sciences|programs in Data Sciences]].<br />
* [[Sometimes Asked Questions]]<br />
* It would be a great contribution to compile a list of exemplary subject-matter papers reporting statistical methods, especially modern methods such as longitudinal data analysis, etc. <br />
__TOC__<br />
<br />
== Seminars ==<br />
* [[SCS Reads 2019-2020|SCS Reads 2019-2020 ????????]]<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
<br />
== Statistical Seminars in the Toronto Area ==<br />
* [http://qm.info.yorku.ca/ York Quantitative Methods Program in Psychology]<br />
* [http://www.math.yorku.ca/Who/Faculty/Rensburg/Colloquium/Colloquium2013.html York Department of Mathematics and Statistic]<br />
<br />
== Workshops and Courses ==<br />
* [http://blackwell.math.yorku.ca/ICPSR2017 ICPSR 2017 Course in Longitudinal Data Analysis with Mixed and Bayesian Models]<br />
* [[SCS_2017:_Longitudinal_and_Nested_Data|SCS 2017 Models and Analysis for Longitudinal and Nested Data]]<br />
* [[SCS 2014: Visualizing Regression]]<br />
* [[Mixed Models with R]]<br />
* [[SPIDA 2012: Mixed Models with R]]<br />
* [[SCS 2012: Mixed Models with R]]<br />
* [[SCS 2011: Statistical Analysis and Programming with R]]<br />
* [[SCS 2012: A Gentle Introduction to R]]<br />
* [[MATH 6627|MATH 6627 Practicum in Statistical Consulting]]<br />
* [[MATH 6643 Summer 2012 Applications of Mixed Models]]<br />
<br />
== Methods ==<br />
<br />
=== Data Analysis ===<br />
*[[Data Cleaning]]<br />
<br />
* [[Survival Analysis]]<br />
<br />
* [[Latent Variable Models]] (e.g. SEM, CFA, IRT, LCA)<br />
<br />
* [[Multilevel/Mixed Models]]<br />
<br />
* [[General Linear Models]] (e.g. multiple regression, ANOVA)<br />
<br />
* [[Categorical Data Analysis]] (e.g.contingency tables, chi-square, logistic regression)<br />
<br />
* [[Aggregate Data]] (e.g. meta-analysis)<br />
=== Displaying Data and Reporting===<br />
* Good papers on graphs:<br />
*:[http://www.ruf.rice.edu/~lane/papers/designing_better_graphs.pdf Lane, D.M., & Sandor, A. (2009). Designing better graphs by including distributional information and integrating words, numbers, and images. ''Psychological Methods, 14,'' 239-257.]<br />
*:[http://euclid.psych.yorku.ca/www/lab/psy6140/papers/kastellec-using-graphs.pdf Kastellec, J. P. and Leoni, E. L. (2007) Using Graphs Instead of Tables in ''Political Science, Perspectives on Politics'', 5, 755--771]. <br />
*:See also the related web site for Kastellec & Leoni, [http://tables2graphs.com/doku.php Using Graphs Instead of Tables].<br />
<br />
* A [http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist checklist for statistical reporting]<br />
<br />
* [http://www.statlit.org/pdf/2001SchieldBusOfComm.pdf Describing Rates and Percentages in Tables]<br />
<br />
=== Statistical Topics ===<br />
*[[Causality]]<br />
*[[Model Selection]]<br />
*[[Programs in Data Sciences]]<br />
*[[Professional Accreditation]]<br />
*[[Statistics]]<br />
*[[Statistics in the News]]<br />
<br />
=== Statistical Consulting Support ===<br />
* [http://wiki.math.yorku.ca/index.php/MATH_6627_2007-08 York's Statistical Consulting Practicum Wiki 2007-2008]<br />
<br />
* [http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting Statistical Consulting Practicum Wiki 2011]<br />
<br />
* [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/01/rindskopfs_rule.html David Rindskopf's Rules for Consultants]<br />
<br />
* [http://www.rci.rutgers.edu/~cabrera/sc/ Javier Cabrera's Statistical Consulting Course]<br />
<br />
* [http://www.statsci.org/smyth/pubs/training.html Gordon Smyth on Training Students to be Consultants]<br />
<br />
* [http://www.amstat.org/sections/cnsl/ The American Statistical Association's Consulting Section] <br />
<br />
* Janice Derr on the Qualities of an Effective Statistical Consultant [[File: Janice_Derr_on_Consulting.pdf]]<br />
<br />
* [http://www.stat.purdue.edu/scs/help/notes_for_consultants.html Purdue University's Guide for Statistical Consultants]<br />
<br />
* [[Links to other Statistical Consulting Services]]<br />
<br />
== Software ==<br />
* [[R]]<br />
* [[SAS]]<br />
* [[SPSS/PASW]]<br />
* [http://www.gnu.org/software/pspp/ PSPP: open-source package inspired by SPSS]<br />
* [http://www.statmodel.com MPlus]<br />
* How to access software remotely from York's WebFAS system: <br />
** [[Media:Webfas_handout_SAS.pdf|SAS version]]<br />
=== Local R packages ===<br />
The 'spida' and 'p3d' packages are now available through github with<br />
* devtools::install_github('gmonette/spida2') and<br />
* devtools::install_github('gmonette/p3d')<br />
respectively.<br />
<br />
== SCS Administration ==<br />
* [[SCS TAships: Allocation of time]]<br />
* [[SCS Staff Meetings, 2011]]<br />
<br />
== Interesting Links ==<br />
=== Statistical consulting services at other universities === <br />
*[http://cscar.research.umich.edu/about/ University of Michigan]<br />
* [[Slides]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
*[[Statistics-specific Job Boards]]<br />
<br />
'''Did you know?'''<br />
<br />
http://www12.statcan.gc.ca/census-recensement/index-eng.cfm</div>Georgeshttp://scs.math.yorku.ca/index.php/Paradoxes,_Fallacies_and_Other_SurprisesParadoxes, Fallacies and Other Surprises2019-08-30T16:59:05Z<p>Georges: /* Inference Paradoxes */</p>
<hr />
<div>An attempt at a taxonomy of statistical paradoxes, fallacies and other surprises.<br />
<br />
== Simpson's Paradox ==<br />
Classical examples:<br />
* '''Florida capital sentencing''' of convicted murderers: Suppression effect of a confounder: No marginal relationship between race of accused and rate of capital sentencing but the relationship becomes very strong when controlling for a confounding factor: the race of victim.<br />
* '''Berkeley graduate admissions''': Overall lower rate of acceptance for female candidates but no or little gender effect within departments. Women tend to apply to departments that are harder to get into (have low admission rates) for both men and women. By controlling for departments the appearance of gender discrimination disappears. But is this the right analysis? That depends on the mechanism through which discrimination occurs. If university budgeting decision systematically favour departments that teach topics that are more appealing to men, then ''departments'' is a mediator and controlling for departments masks the effect of discrimination. We need different models to identify 'micro-discrimination' at the departmental level and 'macro-discrimination' at the university level. This is analogous to asking whether ''department'' should be treated as a confounder (and included in the model) or as a mediator (and excluded) in order to identify a causal effect of gender. Conditioning is not automatically the right thing to do. So the right model to estimate discrimination depends on the mechanism of discrimination. For mechanisms in which department is a mediator or a collider, it must be omitted. For mechanisms in which it is a confounder, it must be included.<br />
<br />
== Birth-Weight Paradox ==<br />
Example of conditioning, or selection, on a collider variable.<br />
<br />
== Regression Paradox ==<br />
Discrepancy between global and local relationships. Some classical examples:<br />
* Globally the distribution of heights can remain the same from generation although tall parents get the impression that their children are shorter than themselves and short parents get the impression that their children are taller than themselves on average. Also, tall children get the impression that their parents are shorter than themselves and short children get the impression that their parents are taller than themselves on average. Thus parents get the impression, perfectly legitimately from their point of view, that the distribution of heights is being compressed towards the mean '''and''' children get the impression, also perfectly legitimately, that their parents' heights were more compressed towards the mean.<br />
* Kahneman's pilot instructors got the impression that criticism improved performance of student pilots while praise made it worse. Kahneman thought the causal effect should be in the opposite direction. Regression to the mean allows one to see how Kahneman's belief and the pilot instructors' impression are not inconsistent. The resolution of the paradox lies partly in realizing that the the instructors are noticing an 'observational' relationship that is in the opposite direction to the possible causal relationship. Thus there's a connection with Simpson's Paradox.<br />
<br />
== Lord's Paradox ==<br />
When comparing two groups using a pretest and a posttest, should we compare gain scores between groups or should we regress the posttest on groups using the pretest as a covariate?<br />
<br />
Lord (1967) originally graphed a hypothetical scenario in which males and females were weighed at the beginning and end of an academic year, with gender tested as a predictor of weight gain in response to the cafeteria diet provided at the school. At the start of the school year, females had a lower average weight than males, but the group averages were about the same at the end of the school year. <br />
A difference score model, which uses post-pre as the outcome, concluded that there was no difference in weight gain between males and females, and therefore no systematic influence of gender on changes in weight. <br />
An ANCOVA—that regresses posttest weight on pretest scores as well as the gender predictor—concluded that wherever males and females start the school year with the same initial weight, males are predicted to gain significantly more weight by end of year, leading to the conclusion that gender has a substantial influence on weight gain. <br />
<br />
A more recent example from MLB 2016 data:<br />
Wright (2017) compared the change in batting averages from the first half of Major League Baseball’s 2016 season to the second half, comparing pitchers and position players. ANCOVA that covaries the initial average out concluded that wherever a pitcher and a position player start with the same first half batting average, the position players are predicted to have a higher second half average. <br />
However, the data itself indicates that pitchers actually improve slightly, from .143 in the first half to .166 in the second half, while the position players get slightly worse, from .267 in the first half to .262 in the second half. The gain score approach concludes no difference in the change in batting averages between position players and pitchers. <br />
<br />
== Base rate paradoxes ==<br />
'''Prosecutor's Fallacy''': The p-value to test the hypothesis that Sally Clark was innocent of the murder of her two children could legitimately be interpreted as being in the vicinity of 1/100,000. However there's also a legitimate argument that the probability of her innocence is very close to 1. These two results seemingly contradictory results are not inconsistent with each other.<br />
<br />
'''Representativeness heuristic''': This is a concept formulated by Tversky and Kahneman. One could view<br />
the heuristic as amounting to forming judgements based on relative likelihood, thus ignoring the base rate or 'prior', the foundational fallacy of frequentism. <br />
<br />
'''Stereotyping''': [https://www.sciencedirect.com/science/article/pii/0022103182900798 Social stereotypes and judgments of individuals: An instance of the base-rate fallacy]<br />
<br />
== Prisoner's Paradox==<br />
Also known as the Monty Hall problem or the Principle of Restricted Choice in bridge. This is a very revealing paradox that illustrates the importance of taking into account the probabilistic mechanism that generated information, in addition to the information itself, if the information does not induce a partition of the space of possibilities.<br />
<br />
It illustrates the crucial role of statistical modelling, but perhaps not in a way that is supportive of frequentist inference.<br />
<br />
== Weighting Paradoxes ==<br />
Students at a university report an average class size of 100. Professors report an average class size of 50. Are students likely to be exaggerating and professors underestimating the size of their classes?<br />
<br />
== Paradoxes of measures of central tendency ==<br />
A random sample of 100 taxpayers reveals an average income of $30,000 although the government knows that the average income is $60,000. Is the sample likely to be biased and/or respondents understating their income? Or is there another plausible explanation?<br />
<br />
== Inference Paradoxes ==<br />
It is possible to build a model which the parameter space and the support are both equal to the natural numbers and in which a confidence procedure that has at least 2/3 probability of coverage for all <math>\theta</math> (i.e. 2/3 confidence) has only 1/3 posterior probability for all <math>y</math> under a uniform prior for <math>\theta</math>. Thus confidence and credibility can be strongly inconsistent in contrast with the intuition based on compact models in which mean credibility must equal mean confidence.<br />
<br />
<!-- Conditioning paradoxes --></div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-08-30T16:57:09Z<p>Georges: /* Candidates for 2018-2019 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Candidates for 2019-2020 ==<br />
* __The Crisis in Statistics??__ Use the special issue of the American Statistician devoted to current problems in statistical inference, particularly the use and interpretation of p-value and the concept of statistical significance, as a springboard to select articles from the special issue or elsewhere to discuss whether there is a crisis in statistics and what should we do.<br />
* Add other suggestions here or send them to georges@yorku.ca.<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focusing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].<br />
<br />
== February 1 ==<br />
<br />
* We continue with Chapter 8 of ''The Book of Why'', '''Counterfactuals: Mining Worlds That Could Have Been'''<br />
* MF offered some examples of papers describing R packages for learning/doing causal modeling<br />
** [http://dagity.net/primer Causal Inference in Statistics: A Companion for R Users]. This is designed to accompany the book [http://bayes.cs.ucla.edu/PRIMER/ Causal Inference in Statistics: A Primer] by Pearl, Glymour and Jewell.<br />
** [https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html Vignette for the CausalImpact package]. This package implements a Bayesian approach to causal impact estimation in time series data. It is designed for a specific situation, when you have time series pre- and post- some intervention. I mention it here only because it produces nice graphs of <br />
** [https://www.jstatsoft.org/article/view/v082i02 A Recipe for Interference: Start with Causal Inference. Add Interference. Mix Well with R.] by Bradley Saul & Michael Hudgens</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-08-30T16:48:26Z<p>Georges: /* Converging to a plan this year */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focusing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].<br />
<br />
== February 1 ==<br />
<br />
* We continue with Chapter 8 of ''The Book of Why'', '''Counterfactuals: Mining Worlds That Could Have Been'''<br />
* MF offered some examples of papers describing R packages for learning/doing causal modeling<br />
** [http://dagity.net/primer Causal Inference in Statistics: A Companion for R Users]. This is designed to accompany the book [http://bayes.cs.ucla.edu/PRIMER/ Causal Inference in Statistics: A Primer] by Pearl, Glymour and Jewell.<br />
** [https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html Vignette for the CausalImpact package]. This package implements a Bayesian approach to causal impact estimation in time series data. It is designed for a specific situation, when you have time series pre- and post- some intervention. I mention it here only because it produces nice graphs of <br />
** [https://www.jstatsoft.org/article/view/v082i02 A Recipe for Interference: Start with Causal Inference. Add Interference. Mix Well with R.] by Bradley Saul & Michael Hudgens</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2019-2020SCS Reads 2019-20202019-08-30T16:47:03Z<p>Georges: Created page with "== Links to past episodes of SCS Reads == * SCS Reads 2018-2019 Causality mainly * [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal In..."</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 Causality mainly]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focusing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].<br />
<br />
== February 1 ==<br />
<br />
* We continue with Chapter 8 of ''The Book of Why'', '''Counterfactuals: Mining Worlds That Could Have Been'''<br />
* MF offered some examples of papers describing R packages for learning/doing causal modeling<br />
** [http://dagity.net/primer Causal Inference in Statistics: A Companion for R Users]. This is designed to accompany the book [http://bayes.cs.ucla.edu/PRIMER/ Causal Inference in Statistics: A Primer] by Pearl, Glymour and Jewell.<br />
** [https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html Vignette for the CausalImpact package]. This package implements a Bayesian approach to causal impact estimation in time series data. It is designed for a specific situation, when you have time series pre- and post- some intervention. I mention it here only because it produces nice graphs of <br />
** [https://www.jstatsoft.org/article/view/v082i02 A Recipe for Interference: Start with Causal Inference. Add Interference. Mix Well with R.] by Bradley Saul & Michael Hudgens</div>Georgeshttp://scs.math.yorku.ca/index.php/Wiki_for_statistical_consultingWiki for statistical consulting2019-08-30T16:45:43Z<p>Georges: /* Seminars */</p>
<hr />
<div>This is the wiki for the [http://www.yorku.ca/isr/scs/ Statistical Consulting Service] at York University. You can make [http://www.appointmentquest.com/provider/2000199121 appointments online] to see one of our consultants.<br />
<br />
Change<br />
<br />
Suggestion: to discuss the structure of the wiki, click on the discussion tab above.<br />
<br />
If you would like to have an account on this wiki, please contact Georges Monette [mailto:georges@yorku.ca?subject=Account%20on%20SCS%20wiki].<br />
== Hot topics ==<br />
* Contribute to the list of [[Programs_in_Data_Sciences|programs in Data Sciences]].<br />
* [[Sometimes Asked Questions]]<br />
* It would be a great contribution to compile a list of exemplary subject-matter papers reporting statistical methods, especially modern methods such as longitudinal data analysis, etc. <br />
__TOC__<br />
<br />
== Seminars ==<br />
* [[SCS Reads 2019-2020|SCS Reads 2019-2020 ????????]]<br />
* [[SCS Reads 2018-2019|SCS Reads 2018-2019 ????????]]<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
<br />
== Statistical Seminars in the Toronto Area ==<br />
* [http://qm.info.yorku.ca/ York Quantitative Methods Program in Psychology]<br />
* [http://www.math.yorku.ca/Who/Faculty/Rensburg/Colloquium/Colloquium2013.html York Department of Mathematics and Statistic]<br />
<br />
== Workshops and Courses ==<br />
* [http://blackwell.math.yorku.ca/ICPSR2017 ICPSR 2017 Course in Longitudinal Data Analysis with Mixed and Bayesian Models]<br />
* [[SCS_2017:_Longitudinal_and_Nested_Data|SCS 2017 Models and Analysis for Longitudinal and Nested Data]]<br />
* [[SCS 2014: Visualizing Regression]]<br />
* [[Mixed Models with R]]<br />
* [[SPIDA 2012: Mixed Models with R]]<br />
* [[SCS 2012: Mixed Models with R]]<br />
* [[SCS 2011: Statistical Analysis and Programming with R]]<br />
* [[SCS 2012: A Gentle Introduction to R]]<br />
* [[MATH 6627|MATH 6627 Practicum in Statistical Consulting]]<br />
* [[MATH 6643 Summer 2012 Applications of Mixed Models]]<br />
<br />
== Methods ==<br />
<br />
=== Data Analysis ===<br />
*[[Data Cleaning]]<br />
<br />
* [[Survival Analysis]]<br />
<br />
* [[Latent Variable Models]] (e.g. SEM, CFA, IRT, LCA)<br />
<br />
* [[Multilevel/Mixed Models]]<br />
<br />
* [[General Linear Models]] (e.g. multiple regression, ANOVA)<br />
<br />
* [[Categorical Data Analysis]] (e.g.contingency tables, chi-square, logistic regression)<br />
<br />
* [[Aggregate Data]] (e.g. meta-analysis)<br />
=== Displaying Data and Reporting===<br />
* Good papers on graphs:<br />
*:[http://www.ruf.rice.edu/~lane/papers/designing_better_graphs.pdf Lane, D.M., & Sandor, A. (2009). Designing better graphs by including distributional information and integrating words, numbers, and images. ''Psychological Methods, 14,'' 239-257.]<br />
*:[http://euclid.psych.yorku.ca/www/lab/psy6140/papers/kastellec-using-graphs.pdf Kastellec, J. P. and Leoni, E. L. (2007) Using Graphs Instead of Tables in ''Political Science, Perspectives on Politics'', 5, 755--771]. <br />
*:See also the related web site for Kastellec & Leoni, [http://tables2graphs.com/doku.php Using Graphs Instead of Tables].<br />
<br />
* A [http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist checklist for statistical reporting]<br />
<br />
* [http://www.statlit.org/pdf/2001SchieldBusOfComm.pdf Describing Rates and Percentages in Tables]<br />
<br />
=== Statistical Topics ===<br />
*[[Causality]]<br />
*[[Model Selection]]<br />
*[[Programs in Data Sciences]]<br />
*[[Professional Accreditation]]<br />
*[[Statistics]]<br />
*[[Statistics in the News]]<br />
<br />
=== Statistical Consulting Support ===<br />
* [http://wiki.math.yorku.ca/index.php/MATH_6627_2007-08 York's Statistical Consulting Practicum Wiki 2007-2008]<br />
<br />
* [http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting Statistical Consulting Practicum Wiki 2011]<br />
<br />
* [http://www.stat.columbia.edu/~cook/movabletype/archives/2008/01/rindskopfs_rule.html David Rindskopf's Rules for Consultants]<br />
<br />
* [http://www.rci.rutgers.edu/~cabrera/sc/ Javier Cabrera's Statistical Consulting Course]<br />
<br />
* [http://www.statsci.org/smyth/pubs/training.html Gordon Smyth on Training Students to be Consultants]<br />
<br />
* [http://www.amstat.org/sections/cnsl/ The American Statistical Association's Consulting Section] <br />
<br />
* Janice Derr on the Qualities of an Effective Statistical Consultant [[File: Janice_Derr_on_Consulting.pdf]]<br />
<br />
* [http://www.stat.purdue.edu/scs/help/notes_for_consultants.html Purdue University's Guide for Statistical Consultants]<br />
<br />
* [[Links to other Statistical Consulting Services]]<br />
<br />
== Software ==<br />
* [[R]]<br />
* [[SAS]]<br />
* [[SPSS/PASW]]<br />
* [http://www.gnu.org/software/pspp/ PSPP: open-source package inspired by SPSS]<br />
* [http://www.statmodel.com MPlus]<br />
* How to access software remotely from York's WebFAS system: <br />
** [[Media:Webfas_handout_SAS.pdf|SAS version]]<br />
=== Local R packages ===<br />
The 'spida' and 'p3d' packages are now available through github with<br />
* devtools::install_github('gmonette/spida2') and<br />
* devtools::install_github('gmonette/p3d')<br />
respectively.<br />
<br />
== SCS Administration ==<br />
* [[SCS TAships: Allocation of time]]<br />
* [[SCS Staff Meetings, 2011]]<br />
<br />
== Interesting Links ==<br />
=== Statistical consulting services at other universities === <br />
*[http://cscar.research.umich.edu/about/ University of Michigan]<br />
* [[Slides]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
*[[Statistics-specific Job Boards]]<br />
<br />
'''Did you know?'''<br />
<br />
http://www12.statcan.gc.ca/census-recensement/index-eng.cfm</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2018-2019SCS Reads 2018-20192019-08-19T23:53:27Z<p>Georges: /* February 1 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focusing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].<br />
<br />
== February 1 ==<br />
<br />
* We continue with Chapter 8 of ''The Book of Why'', '''Counterfactuals: Mining Worlds That Could Have Been'''<br />
* MF offered some examples of papers describing R packages for learning/doing causal modeling<br />
** [http://dagity.net/primer Causal Inference in Statistics: A Companion for R Users]. This is designed to accompany the book [http://bayes.cs.ucla.edu/PRIMER/ Causal Inference in Statistics: A Primer] by Pearl, Glymour and Jewell.<br />
** [https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html Vignette for the CausalImpact package]. This package implements a Bayesian approach to causal impact estimation in time series data. It is designed for a specific situation, when you have time series pre- and post- some intervention. I mention it here only because it produces nice graphs of <br />
** [https://www.jstatsoft.org/article/view/v082i02 A Recipe for Interference: Start with Causal Inference. Add Interference. Mix Well with R.] by Bradley Saul & Michael Hudgens</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2018-2019SCS Reads 2018-20192019-08-19T23:52:38Z<p>Georges: /* February 1 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focusing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].<br />
<br />
== February 1 ==<br />
<br />
* We continue with Chapter 8 of ''The Book of Why'', '''Counterfactuals: Mining Worlds That Could Have Been'''<br />
* MF offered some examples of papers describing R packages for learning/doing causal modeling<br />
** [http://dagity.net/primer Causal Inference in Statistics: A Companion for R Users]. This is designed to accompany the book [http://bayes.cs.ucla.edu/PRIMER/ Causal Inference in Statistics: A Primer] by Pearl, Glymour and Jewell.<br />
** [https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html Vignette for the CausalImpact package]. This package implements a Bayesian approach to causal impact estimation in time series data. It is designed for a specific situation, when you have time series pre- and post- some intervention. I mention it here only because it produces nice graphs of <br />
** [https://www.jstatsoft.org/article/view/v082i02 A Recipe for Inference: Start with Causal Inference. Add Interference. Mix Well with R.] by Bradley Saul & Michael Hudgens</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_TAships:_Allocation_of_timeSCS TAships: Allocation of time2019-08-19T23:33:15Z<p>Georges: /* Senior TA (3rd or later years) */</p>
<hr />
<div>SCS teaching assistants are typically assigned either a 'half TA ship' (135 hours) or a 'full TA ship' (270 hours) over the fall and winter terms.<br />
<br />
Depending on the level of experience of the TA, these hours can be allocated in different ways. The following are three suggested assignments suitable for a TA in their first year with SCS, a TA in their second year, or a TA in subsequent years.<br />
<br />
== First-year TAs ==<br />
<br />
In the first term in which a TA is appointed to SCS, it is expected that they will devote their time to preparation for consulting by attending consulting sessions of experienced consultants with a balance between faculty/staff consultants and experienced TA consultants.<br />
<br />
A new SCS TA will be expected to focus attention on those aspects of the consulting session (eliciting a problem<br />
description from the client, asking background questions, determining an appropriate framework for advice, etc.)<br />
that can contribute to a successful outcome for the client. In these training sessions, it is often useful to discuss<br />
the session with the SCS consultant after it is concluded.<br />
<br />
With a 0.5 TA, which amounts to 5 hours/week x 27 weeks = 135 hours for the full year, you would, during the fall term, attend weekly SCS meetings (1 hour), attend 3 hours of consulting sessions per week and devote one hour to other activities including organization, preparation and development. Note that it can take some time and effort to make arrangements to sit in on consulting sessions. This is recognized in the provision of time for that purpose. In the winter term, you would follow the schedule of a regular TA:<br />
<br />
Fall term: (13 weeks)<br />
13 1-hour SCS meetings 13<br />
11 weeks attending consulting sessions 11 x 3 33<br />
Organization, preparation and development 19<br />
<br />
Winter term: (14 weeks)<br />
14 1-hour SCS meetings 14<br />
14 weeks of consulting 14 x 2 28<br />
Preparation and development 28<br />
Total 135<br />
<br />
== Experienced TAs (2nd year) ==<br />
<br />
For TAs who have already had the experience of observing consulting session, with a 0.5 TA, which amounts to 5 hours/week x 27 weeks = 135 hours, you would attend weekly SCS meetings (1 hour), do 2 hours of consulting and the remaining 2 hours would be devoted to preparation and development:<br />
<br />
27 1-hour SCS meetings 27<br />
27 weeks of consulting 27 x 2 54<br />
Preparation and development 54<br />
Total 135<br />
<br />
== Senior TA (3rd or later years) ==<br />
<br />
With more experience you might like to teach an SCS short course. Part of the appeal of teaching an SCS course is the experience it provides and that it constitutes valuable teaching experience for future job applications.<br />
<br />
Generally TAs stop consulting during the period of the course so here's a possible rationale for a 0.5 TA over a year (total 135 hours = 27 x 5).<br />
<br />
27 1-hour SCS meetings 27<br />
23 weeks of consulting 23 x 2 46<br />
23 weeks of prep at 1 hr 23<br />
4 weeks of a 3-hour course 12<br />
Course preparation 27<br />
Total 135</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_TAships:_Allocation_of_timeSCS TAships: Allocation of time2019-08-19T23:31:56Z<p>Georges: /* Experienced TAs (2nd year) */</p>
<hr />
<div>SCS teaching assistants are typically assigned either a 'half TA ship' (135 hours) or a 'full TA ship' (270 hours) over the fall and winter terms.<br />
<br />
Depending on the level of experience of the TA, these hours can be allocated in different ways. The following are three suggested assignments suitable for a TA in their first year with SCS, a TA in their second year, or a TA in subsequent years.<br />
<br />
== First-year TAs ==<br />
<br />
In the first term in which a TA is appointed to SCS, it is expected that they will devote their time to preparation for consulting by attending consulting sessions of experienced consultants with a balance between faculty/staff consultants and experienced TA consultants.<br />
<br />
A new SCS TA will be expected to focus attention on those aspects of the consulting session (eliciting a problem<br />
description from the client, asking background questions, determining an appropriate framework for advice, etc.)<br />
that can contribute to a successful outcome for the client. In these training sessions, it is often useful to discuss<br />
the session with the SCS consultant after it is concluded.<br />
<br />
With a 0.5 TA, which amounts to 5 hours/week x 27 weeks = 135 hours for the full year, you would, during the fall term, attend weekly SCS meetings (1 hour), attend 3 hours of consulting sessions per week and devote one hour to other activities including organization, preparation and development. Note that it can take some time and effort to make arrangements to sit in on consulting sessions. This is recognized in the provision of time for that purpose. In the winter term, you would follow the schedule of a regular TA:<br />
<br />
Fall term: (13 weeks)<br />
13 1-hour SCS meetings 13<br />
11 weeks attending consulting sessions 11 x 3 33<br />
Organization, preparation and development 19<br />
<br />
Winter term: (14 weeks)<br />
14 1-hour SCS meetings 14<br />
14 weeks of consulting 14 x 2 28<br />
Preparation and development 28<br />
Total 135<br />
<br />
== Experienced TAs (2nd year) ==<br />
<br />
For TAs who have already had the experience of observing consulting session, with a 0.5 TA, which amounts to 5 hours/week x 27 weeks = 135 hours, you would attend weekly SCS meetings (1 hour), do 2 hours of consulting and the remaining 2 hours would be devoted to preparation and development:<br />
<br />
27 1-hour SCS meetings 27<br />
27 weeks of consulting 27 x 2 54<br />
Preparation and development 54<br />
Total 135<br />
<br />
== Senior TA (3rd or later years) ==<br />
<br />
With more experience you might like to teach an SCS short course. The appeal of teaching an SCS course is the experience it provides and also that it constitutes a form of teaching experience for future job applications.<br />
<br />
Generally TAs stop consulting during the period of the course so here's a possible rationale for a 0.5 TA over a year (total 135 hours = 27 x 5).<br />
<br />
27 1-hour SCS meetings 27<br />
23 weeks of consulting 23 x 2 46<br />
23 weeks of prep at 1 hr 23<br />
4 weeks of a 3-hour course 12<br />
Course preparation 27<br />
Total 135</div>Georgeshttp://scs.math.yorku.ca/index.php/R/Comparing_tidyverse_with_base_RR/Comparing tidyverse with base R2019-05-10T15:35:53Z<p>Georges: /* filter vs subset */</p>
<hr />
<div>== filter vs subset ==<br />
* see [https://stackoverflow.com/questions/39882463/difference-between-subset-and-filter-from-dplyr a discussion on stackoverflow]<br />
* filter works on external SQL data without importing all the data<br />
* filter_ is a safe alternative that avoid non-standard evaluations<br />
* filter works with tibbles<br />
== Curiosities ==<br />
* The additional arguments in 'filter' and intersected while the additonal arguments in 'select' and unioned.</div>Georgeshttp://scs.math.yorku.ca/index.php/R/Comparing_tidyverse_with_base_RR/Comparing tidyverse with base R2019-05-10T15:23:06Z<p>Georges: Created page with "== filter vs subset == * see [https://stackoverflow.com/questions/39882463/difference-between-subset-and-filter-from-dplyr a discussion on stackoverflow] * filter works on extern..."</p>
<hr />
<div>== filter vs subset ==<br />
* see [https://stackoverflow.com/questions/39882463/difference-between-subset-and-filter-from-dplyr a discussion on stackoverflow]<br />
* filter works on external SQL data without importing all the data<br />
* filter_ is a safe alternative that avoid non-standard evaluations<br />
* filter works with tibbles</div>Georgeshttp://scs.math.yorku.ca/index.php/RR2019-05-10T15:20:03Z<p>Georges: /* Working with R */</p>
<hr />
<div>== Getting started with R ==<br />
* [[R: Getting started with R|Getting started with R]]<br />
* [[R: R tutorials and courses|Tutorials, courses and blogs]]<br />
* [http://www.rseek.org Searching in the R universe]<br />
* [http://www.r-project.org/mail.html R mailing lists]<br />
* [http://www.personality-project.org/r/psych/short_courses/wcp-short.pdf William Revelle (2013) An introduction to R in Personality Research]<br />
*:An outstanding introduction to R for researchers in any field.<br />
=== Local courses=== <br />
* [[SCS_2012:_A_Gentle_Introduction_to_R|A Gentle Introduction to R by Carrie Smith and Rob Cribbie]]<br />
* [[SCS 2011: Statistical Analysis and Programming with R]]<br />
=== TABA Talk, Jan. 15, 2013 ===<br />
* [http://blackwell.math.yorku.ca/SCS/R/TABA%20Talk.pdf First part of talk]<br />
* [http://blackwell.math.yorku.ca/SCS/R/TABA_Talk.html R Markdown output]<br />
* [http://blackwell.math.yorku.ca/SCS/R/TABA_Talk.Rmd R Markdown script]<br />
<br />
== Working with R ==<br />
* [[/Comparing tidyverse with base R]]<br />
* Finding all the functions in all the packages that will do a particular task:<br />
** Install package 'sos' and use its function 'findFn': e.g. findFn("Item Response Theory")<br />
* [[R/FAQ|FAQ]]<br />
* [[/Big Data with R|Big Data with R]]<br />
* [[/Building packages|Building packages]]<br />
* [[/C functions in R code|C functions in R code]]<br />
* [[/Date|Date objects in R]]<br />
* Setting up an [[Rprofile]]<br />
* [[High Performance Computing with R]]<br />
* Importing data into R<br />
** [[/Importing data from Excel|Importing data from Excel]]<br />
** [[/Importing data from SPSS|Importing data from SPSS]]<br />
** [[R: Importing dates from SPSS|Importing dates from SPSS]]<br />
** [[/Importing dates from .csv files|Importing dates from .csv files]]<br />
** [[/Importing dates and times|Importing dates and times]]<br />
** [[/Importing data from XML files|Importing data from XML files]]<br />
** [[/Reading a file one line at a time|Reading a file one line at a time]]<br />
* [[/packages|Local packages]]<br />
* [[Mixed Models with R]]<br />
* [[/Predict with expand.grid|Predict with expand.grid]]<br />
* [[/Reshaping data|Reshaping data]]<br />
* [[/Traps and pitfalls|Traps and pitfalls]]<br />
* [[/Updating R]]<br />
* [[/Faster Basic Linear Algebra Subprograms (BLAS)|Faster Basic Linear Algebra Subprograms (BLAS)]]<br />
<br />
== Graphics in R ==<br />
* [[R Graphs Gallery]]<br />
*: This is a collection of useful graphs. Please add to it.<br />
* [http://rgm2.lab.nig.ac.jp/RGM2/images.php?show=all&pageID=1657 R Graphical Manual]<br />
* [[/Lattice tricks|Lattice tricks]]<br />
* [http://www.yaksis.com/posts/r-chart-chooser.html Chart chooser for ggplot2]<br />
<br />
== Links ==<br />
* An aggregation of lots of current R web content: [http://www.r-bloggers.com/ R bloggers]<br />
* [http://www.murdoch-sutherland.com/Rtools/installer.html Compiling packages under Windows]<br />
* [http://www.stat.nus.edu.sg/~stachenz/Rsurv.pdf Using R for Survival Analysis]<br />
* A blog about statistics and R: [http://blog.revolutionanalytics.com/ Revolutions]<br />
* [http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html Google's style guide for R]<br />
* [[/test page|test]]</div>Georgeshttp://scs.math.yorku.ca/index.php/User:GeorgesUser:Georges2019-03-18T04:28:41Z<p>Georges: /* Notes on Mixed Models */</p>
<hr />
<div>__TOC__<br />
== Notes on Mixed Models ==<br />
=== Table of Topics ===<br />
* diagnostics<br />
* weights, varClasses, pdClasses, corClasses, pdTriangular<br />
* designs: nested, etc.<br />
=== Notes ===<br />
* Look at influence in car: uses dropone with parallelization and few iterations<br />
* LR tests assumes log-likelihood is quadratic on 'some' transformation of parameter, Wald assumes quadratic on the scale of parameter tested.<br />
* varPower(form = ~ fitted(.), fixed = 1)<br />
** The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response.<br />
** the default level for 'fitted' is the finest level. <br />
** The variance function must be expressed in terms of the expected value of the re-expressed response.<br />
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf<br />
* <opml><body><outline text="Microsoft PowerPoint - Slides" _note="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf " /></body></opml><br />
<br />
== Paradoxes, Fallacies and Other Surprises ==<br />
[[Paradoxes, Fallacies and Other Surprises]]<br />
== Bayes ==<br />
* Two consecutive issues of Statistical Science in 2011 have many interesting article that are related to Bayesian inference:<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059127<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059971<br />
* Experimenting with files:<br />
** jpg file<br />
*** Using the wiki link to the uploaded name: [[File:2013-12-29 18.20.34.jpg|thumb]]<br />
*** Using the wiki link to the uploaded name as media: [[Media:2013-12-29 18.20.34.jpg]]<br />
** .R file<br />
*** FIle wiki link [[File:Tcells.R]]<br />
*** Media wiki link [[Media:Tcells.R]]<br />
* [[Useful formulas]]<br />
* SEM with STAN<br />
** [https://groups.google.com/forum/#!topic/stan-users/dVjm8iES54k Forum discussion re slow convergence]<br />
* Interaction fallacy in a presentation:<br />
*:If you think two variable affect each other then you should include an interaction between them. (Fooled by the word 'interaction').<br />
*[http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf Gelman and Robert on Bayes]<br />
*[http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines MARS: Multivariate Adaptive Regression Splines]<br />
*[http://cran.r-project.org/web/packages/mosaic/vignettes/V2StartTeaching.pdf Teaching with R using MOSAIC by ... and D. Kaplan]<br />
*[http://vudlab.com/simpsons/ Causality: interactive app illustrating Simpson's Paradox]<br />
*[http://www.metafor-project.org/doku.php/tips:testing_factors_lincoms metafor: Tutorial using mixed models for meta analysis]<br />
* [http://andrewgelman.com/2006/06/11/survey_weights/ Andrew Gelman on survey weights with multilevel models]: he suggests unweighted modeling (or a 'variance weighted' analysis, e.g. replication weights) followed by poststratification. <br />
* [https://stat.duke.edu/courses/Fall11/sta101.02/labs/lab1.pdf Intro to R and Rstudio in an intro course at Duke]<br />
* [http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf Intro to R in RStudio]<br />
* [http://www.crcpress.com/product/isbn/9781466515857 Multilevel Modeling Using R]<br />
* [http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf Multilevel Modeling in R by Paul Bliese]<br />
* [http://blog.revolutionanalytics.com/2014/08/statistics-losing-ground-to-cs-losing-image-among-students.html Losing ground to CS?]<br />
* [https://www.youtube.com/watch?v=Is1Ej0Vj0Mw Interview with David Smith at UseR 2014]<br />
* [http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#mapping-variable-values-to-colors On colour]<br />
* [http://cran.r-project.org/web/packages/plot3D/vignettes/plot3D.pdf 3d plotting packages]<br />
* [http://blogs.sas.com/content/iml/2014/08/05/stiglers-seven-pillars-of-statistical-wisdom/ Stigler's Seven Pillars of Statistics]<br />
* [http://glmm.wikidot.com/faq#modelspec FAQ on GLMMs]<br />
* [http://www.math.uah.edu/stat/ Virtual Labs in Probability and Statistics]<br />
* [http://tryr.codeschool.com/ R code school with O'Reilly]<br />
* [http://cran.r-project.org/web/packages/pastecs/pastecs.pdf pastecs package for time series]<br />
* [http://www.bbc.com/news/magazine-28166019 Do doctors understand test results? By William Kremer -- about Gerd Gigerenzer]<br />
* [[/Multiple Testing -- a comment]]<br />
* [[/MOOCs for Data Science]]<br />
* [http://artssquared.wordpress.com/2012/03/21/letter-to-the-provost-and-vp-academic-professor-carl-amrhein/ Arts Squared]<br />
* [[/Using R Markdown]]<br />
* [[/Statistics Links for Courses]]<br />
* [[/Lee Lorch]]<br />
* [http://arxiv.org/pdf/1402.1894v1.pdf Baumer et al. (2014) Using R Markdown in Intro Stats]<br />
* [[/Job ratings]]<br />
* [http://www.stat.cmu.edu/~hseltman/PIER/Bayes/data/schools.R Using WinBUGS on the Netherlands data]<br />
* [[/Climate Change|/Climate Change]]<br />
* [[/Standardize or Not]]<br />
*[[/Mixed Models -- papers]]<br />
*[[/MCMCglmm]]<br />
*[[/Wiki tests]]<br />
<br />
== Cause, correlation, or ... ==<br />
*[http://p.nytimes.com/email/re?location=4z5Q7LhI+KUPcT7snurzN09anQA2MM49IhNWGFarU5GcvOIXzFz0cSuazUKJK97uTp6+uuRfTEhO+dnMZordtiC8Du17IY2zzXWzY7etdKdkq0H3sffQdSh+6YpVRsJcs8ZeVrfzShCpIEB5wXVC1g==&campaign_id=23&instance_id=45715&segment_id=62874&user_id=ecb65bd8a646c4ab6214f51d21246fce&regi_id=7495274 Instant noodles and metabolic syndrome]<br />
== Notepad ==<br />
* [http://www.r-bloggers.com/some-r-resources-for-glms/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R resources for GLMs]<br />
* [https://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture17.pdf On Mosaic plots]<br />
* [[/Academic and Administrative Program Review]]<br />
* [[/Statistics programs]]<br />
* [http://www.yorku.ca/careers/gpse/2013/ Careers Expo at York] with 13 booths from UofT such as Dalla Lana but only one generic FGS booth from York.<br />
* [https://www.researchgate.net/publication/257516014_One_paradox_in_statistical_decision_making#! Schervish's p-value paradox]<br />
* [[/Data]]<br />
* [http://www.universityaffairs.ca/course-evaluations-the-good-the-bad-and-the-ugly.aspx Course evaluations: the good, the bad and the ugly]<br />
* Larry Wasserman on<br />
** [http://normaldeviate.wordpress.com/2013/04/27/the-perils-of-hypothesis-testing-again/ Perils of Hypothesis Testing]<br />
**[http://normaldeviate.wordpress.com/2013/04/13/data-science-the-end-of-statistics/ Data Science]<br />
**[http://normaldeviate.wordpress.com/2012/06/18/48/ Causality]<br />
*[[/HOA]]<br />
*[[/Big data]]<br />
*[[/Student satisfaction]]<br />
*[http://normaldeviate.wordpress.com/2012/12/21/guest-post-rob-tibshirani/ Rob Tibshirani's list of 9 great statistics papers]<br />
*[http://www.newyorker.com/online/blogs/johncassidy/2013/04/the-rogoff-and-reinhart-controversy-a-summing-up.html?mobify=0 Cassidy on the Reinhart-Rogoff controversy.]<br />
*[http://clinicaltrials.gov/ Clinical Trials registry in the US]<br />
*[http://en.wikipedia.org/wiki/Cochrane_Collaboration The Cochrane Collection]<br />
* 2004 ICMJE: policy of registration:<br />
<br />
__TOC__<br />
Recommended sources on statistics:<br />
<br />
There are many excellent sources for information on current statistical issues (Psychonomic Society Journals):<br />
<br />
* Confidence Intervals:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci).<br />
** Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 57, 203-220. doi:10.1037/h0087426<br />
* Effect Size Estimates:<br />
** Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis and the interpretation of research results. Cambridge University Press. ISBN 978-0-521-14246-5.<br />
** Fritz, C. O., Morris, P. E., & Richler, J. J. (2011). Effect size estimates: Current use, calculations and interpretation. Journal of Experimental Psychology: General, 141, 2-18.<br />
** Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York, NY: Routledge/Taylor & Francis Group.<br />
* Meta-analysis:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY US: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci ).<br />
** Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis. New York: Oxford University Press.<br />
* Bayesian Data Analysis:<br />
** Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. San Diego, CA: Elsevier Academic Press. (See www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/)<br />
** Kruschke, J. K. (in press). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. (For a preprint see http://www.indiana.edu/~kruschke/BEST/BEST.pdf).<br />
* Power Analysis:<br />
** Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. (See http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/)<br />
=== Blogs ===<br />
* [http://jeromyanglim.blogspot.ca Jeromy Anglim]<br />
<br />
== Pythagoras Diagram ==<br />
* [http://pages.uoregon.edu/stevensj/MRA/partial.pdf Venn diagram 'fallacy' example]<br />
<br />
==Recent changes==<br />
[[/Links|Links]]<br><br />
[[/Recent Changes]] [[/Contributions]]<br><br />
[[/DO]]<br />
==Topics==<br />
* [[/R packages]]<br />
* [[/Curriculum]]<br />
* [[/HLM links]]<br />
* [[/Education links]]<br />
* Death of Evidence<br />
** [http://www.nature.com/nature/journal/v483/n7387/full/483006a.html Article in Nature: Frozen Out, March 1, 2012]<br />
** [http://www.deathofevidence.ca/ Death of Evidence website]<br />
* [[/Mixed effects for multinomial responses]]<br />
* [[/Ellipse paper comments]]<br />
* On Tobacco (from Matt)<br />
** [http://io9.com/5899612/low+income-countries-are-a-cigarettes-best-friend Low income countries and tobacco]<br />
** [http://www.tobaccoatlas.org/uploads/Images/PDFs/Tobacco_Atlas_4_entire.pdf The Tobacco Atlas]<br />
*[[/fda.R]]<br />
*[[/FSE Scholars evening]]<br />
*[[/MATH 6627 student contributions]] <br />
:[http://scs.math.yorku.ca/index.php?title=Special:UserLogin&type=signup Create new account]<br />
:[http://scs.math.yorku.ca/index.php/SCS_2011:_Statistical_Analysis_and_Programming_with_R SCS R course]<br />
:[[/R packages]]<br />
:[[/SPIDA 20102 preparation]]<br />
__TOC__<br />
== Data scraping ==<br />
* [http://www.r-bloggers.com/preparing-public-data-for-analysis-with-r/ Example from Ministry of Transportation]<br />
== RStudio: Shiny ==<br />
* [http://www.premiersoccerstats.com/wordpress/?p=1273 On Shiny]<br />
* [http://demo.rapporter.net/?sport=ATH-170&weight=0 Rapporter.net]<br />
<br />
== Notes for 6643 ==<br />
* Assignment: Can we produce an estimate of AIC based just on the Wald test?<br />
<br />
== On Pedagogy ==<br />
* [http://www.guardian.co.uk/higher-education-network/blog/2012/oct/18/social-sciences-quantative-skills-training On the importance of quantitative skills in social science]<br />
* http://www.matstat.com/teach/<br />
* [http://www.ma.utexas.edu/users/mks/statmistakes/StatisticsMistakes.html Common Misteaks in Statistics]<br />
=== Advice for students ===<br />
* [http://www.universityaffairs.ca/how-to-ask-for-a-reference-letter.aspx How to ask for a reference letter]<br />
<br />
== Questions (e.g. for survey papers) ==<br />
* Implement more diagnostics in R for lme models<br />
* Explore duality of the whole data matrix<br />
* Extend the UD representation to hyperbola, etc., and include a way of plotting osculation loci<br />
* Explore the geometry of harmonic combinations and its implications for mixed model estimates. What happens as you shift weight from G to <math>(X'X)^{-1}</math>? How does the result wander outside the convex combination? When does it happen and what does it mean?<br />
* Refine Lform and related tools<br />
<br />
== Read ==<br />
* [http://www.ams.org/notices/201001/rtx100100030p.pdf Music: Broken Symmetry,<br />
Geometry, and Complexity]<br />
* https://sites.google.com/site/r4statistics/<br />
<br />
== R course ==<br />
* [https://github.com/hadley/devtools/wiki/ Hadley Wickam Advanced R Development]<br />
* [http://courses.had.co.nz/11-devtools/ Hadley Wickam's R development courses]<br />
Day 2 - add<br />
* final recap of 'lm' interface: subset, na.action, etc., etc.<br />
* discuss formula syntax<br />
* final recap of methods for 'lm'<br />
* note easy extension to 'glm', 'lme', etc.<br />
* note that many 'new' functions do not use this interface, only more 'mature' functions<br />
** lm.formula<br />
* discuss OO showing methods and dispatching<br />
Day 3<br />
* the most useful tools:<br />
** seq<br />
** rep<br />
** replacement functions<br />
* data input<br />
* more programming<br />
** object oriented programming<br />
** using a function in C<br />
* using attributes<br />
* systematic treatment of graphics, including<br />
** par<br />
** xyplot<br />
[[/Day2 Guided Tour of Linear Models.R]]<br />
<br />
== SCS Reads 2011 Links ==<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-I-Simple-v2.pdf Notes on visualizing simple regression]<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-II.r R script for multiple regression]<br />
== Capstone courses ==<br />
* [http://www.amstat.org/publications/jse/v9n1/spurrier.html Course at the U. of South Carolina, 2001]<br />
* <br />
== Links to recent courses ==<br />
*[http://scs.math.yorku.ca/index.php/SCS_2011:_Mixed_Models_with_R SCS 2011: Mixed Models with R]<br />
*[http://statswiki.math.yorku.ca/index.php/SCS:_Mixed_Models_in_R SCS 2010]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2008-09_Practicum_in_Statistical_Consulting MATH 6627 2008-09]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting MATH 6627 2010-11]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2010:_Mixed_Models_with_R SPIDA 2010]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2009:_Mixed_Models_with_R SPIDA 2009]<br />
<br />
*[http://scs.math.yorku.ca/index.php/Spida Spida package]<br />
<br />
== Links to add somewhere ==<br />
* [http://www.ted.com/talks/ben_goldacre_battling_bad_science.html?utm_source=newsletter_weekly_2011-10-04&utm_campaign=newsletter_weekly&utm_medium=email Battling bad science]<br />
*D W Hosmer, S Taber and S Lemeshow () "The importance of assessing the fit of logistic regression models: a case study." ''American Journal of Public Health'', Vol. 81, Issue 12 1630-1635<br />
* [http://www.statmethods.net/ Quick R] for SPSS, SAS and Stata users.<br />
=== Graphics ===<br />
* [http://www.edwardtufte.com/tufte/ ET Modern]<br />
* Striking graphics:<br />
** [http://en.wikipedia.org/wiki/File:U.S._incarceration_rates_1925_onwards.png Incarceration rate in the United States]<br />
<br />
=== Matrices ===<br />
*[http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html Matrix Reference Manual]<br />
*[http://en.wikipedia.org/wiki/Matrix_determinant_lemma Matrix determinant lemma]<br />
*[http://en.wikipedia.org/wiki/Woodbury_matrix_identity Woodbury Matrix Identity]<br />
<br />
=== Simpson's Paradox ===<br />
In the [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979 1979 Canadian federal election] <br />
an unusual event occurred in the Northwest Territories: the Liberals won the popular vote in the territory, but won [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979#National_results neither seat.]<br />
<br />
=== Lee Lorch ===<br />
* [http://aer.sagepub.com/content/36/4/739.abstract Marybeth Gasman (1999)] "Scylla and Charybdis: Navigating the Waters of Academic Freedom at Fisk University During Charles S. Johnson's Administration (1946–1956)" ''American Educational Research Journal''<br />
*: A prominent sociologist and race relations activist, Charles S. Johnson dedicated his life to the advancement of Blacks. His presidency at Fisk University, a historically Black college, was the culmination of his career. During the latter part of his administration, he faced a dilemma involving an outspoken professor named Lee Lorch, who, in 1954, was accused of being a communist. Johnson and the Board of Trustees dismissed Lorch because he refused to answer a congressional committee's questions about his previous political affiliations. In 1959, the American Association of University Professors found the late President Johnson guilty of violating the principles of academic freedom. This article explores the ways in which academic freedom, civil liberties, and civil rights clashed in the Lee Lorch case. Furthermore, it examines the ways in which the setting of a historically Black college alters traditional assumptions about the application of these principles.<br />
* [http://www.nytimes.com/2010/11/22/nyregion/22stuyvesant.html Charles V. Bagli (November 21, 2010), "A New Light on a Fight to Integrate Stuyvesant Town", ''New York Times''.<br />
<br />
== Multilevel Models ==<br />
<br />
=== Expository ===<br />
* [http://glmm.wikidot.com/faq R FAQ]<br />
*[https://perswww.kuleuven.be/~u0018341/documents/ldafda.pdf Verbeke and Molenberghs (2005): Longitudinal Data Analysis Notes]<br />
=== Missing Data ===<br />
* [http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf King et al. (2012) Amelia II]<br />
*<br />
<br />
=== Evaluation ===<br />
* Green, M.J., Medley G.F., & Browne, W.J. (2009). Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression. Veterinary Research, 40(4):30.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675184/pdf/vetres-40-30.pdf<br />
<br />
=== Software for multilevel models ===<br />
{| class="wikitable"<br />
|-<br />
! Package<br />
! Function<br />
! Notes<br />
|-<br />
| R<br />
clmm {ordinal}<br />
| Ordinal response: Fits cumulative link mixed models, i.e. cumulative link models with random effects via the Laplace approximation or the standard and the adaptive Gauss-Hermite quadrature approximation. The functionality in clm is also implemented here. Currently only a single random term is allowed in the location-part of the model.<br />
| <br />
|-<br />
| R: {lme4a}<br />
| Development version of lme4<br />
Download: svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4<br />
<br />
|<br />
|-<br />
|R: {MCMCglmm}<br />
|MCMC Methods for Multi-response Generalized Linear Mixed Models<br />
| <br />
|-<br />
|R: {plm}<br />
|Econometric Analysis of Panel Survey Data<br />
|[http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf Vignette]<br><br />
See p. 3 for comments on first-differencing.<br />
|-<br />
|<br />
|See Snijders and Bosker (2012) for longer list<br />
|<br />
|-<br />
|R: {lme4:nlmm}<br />
|Mon-linear models with lme4<br />
|[http://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/6NLMMH.pdf Presentation by Doug Bates]<br><br />
|}<br />
<br />
== Clones ==<br />
Check for changes and reconcile<br />
* Lab 1<br />
** [[SCS_2011:_Mixed_Models_with_R/Lab_1]]<br />
** [[MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Lab_1]]<br />
<br />
== Read ==<br />
On the age-period-cohort problem:<br />
* see bibliography by Yang: http://home.uchicago.edu/~yangy/research.html<br />
<br />
== Do ==<br />
*[[/spida|spida to do list]]<br />
*[[/p3d|p3d to do list]]<br />
== Read ==<br />
* [http://www.stat.berkeley.edu/~freedman/ Links to recent papers by David Freedman]<br />
* Links to material by Chris Wild:<br />
** http://www.stat.auckland.ac.nz/~wild/StatThink/<br />
** http://www.stat.auckland.ac.nz/showperson.php?uid=wild<br />
* [http://www3.hku.hk/statistics/staff/kaing/ Kai Ng's converse]<br />
<br />
== Notes ==<br />
* [http://www.chrp.org/love/ASACleveland2003Propensity.pdf Good presentation on use of propensity scores]<br />
* [http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/index.html?eref=sihp Simpson's Paradox]<br />
<br />
== R notes ==<br />
* [http://strimmerlab.org/notes/fdr.html False Discovery Rates in R]<br />
=== Items to cover ===<br />
* Wrap up language:<br />
** Selection (give context): indices: index, names, logical, matrix of coordinates, 'subset'<br />
*** Example: dropping NAs from selected variables. Necessary because functions that are most sophisticated methodologically are generally least sophisticated in their interface<br />
**** contrast sophisticated program: lm with unsophisticated lowess<br />
* Using variables in data frames: <br />
** formula oriented functions: xyplot( y ~ x, data = dd )<br />
** explicit: plot( dd$x, dd$y )<br />
** with: with( dd, plot(x,y)); with(dd, xyplot( y ~ x, dd)<br />
** attach: As usual the easiest is deprecated! (why is it only easy and pleasurable things that are ever deprecated)<br />
**: <tt> attach(dd) </tt><br />
**: <tt> plot(x, y) </tt><br />
**: <tt> detach(dd) </tt><br />
*** Problem with 'attach': <br />
**** names in data frame may be masked by names in workspace<br />
**** assignments in workspace not saved in data frame<br />
* Overview of graphics<br />
** Link to http://addictedtor.free.fr/graphiques/<br />
* Programming structures<br />
* Add to graphics:<br />
*:Colours: <tt>pal(grepv('red',colors())); pals() # for all</tt><br />
*:modified tablemissing<br />
==== debugging in R ====<br />
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/debug.shtml<br />
<br />
=== Links ===<br />
*[http://www.stats.ox.ac.uk/pub/MASS4/ MASS 4th ed.] [http://www.stats.ox.ac.uk/pub/MASS4/#Exercises Exercises]<br />
<br />
=== Importing files ===<br />
==== From Excel ====<br />
* Easy: save file in Excel as .csv, then read into R with read.csv<br />
* If you have a lot of files, or get the files from some other sources that edits .xls or .xlsx files:<br />
* The winner: package gdata: <br />
** First install perl. <br />
** read.xls in gdata handles both .xls and .xlsx files<br />
** works on both 32-bit and 64-bit machines<br />
* package XLConnect seems to work only on xlsx files <br />
* the smaller xlsx package also works only xlsx files<br />
* Package xlsReadWrite works on xls files but only on 32-bit systems<br />
* Use xls2csv, a Perl script to convert files to csv first.<br />
<br />
=== Getting lines vs points for different groups in xyplot ===<br />
Ideally, type = c('l','p') would work but it doesn't seem to. So one way is to use type = 'b' with an invisible line for one group and an invisible point for the other:<br />
<br />
library(spida.beta) # also loads 'car'<br />
dd <- Prestige<br />
dd$income.pred <- predict( lm( income ~ education*type, dd), newdata = dd)<br />
td( lty = c(1,0), pch = c(32, 16), lwd = 2) <br />
# lty = 0 produces an invisible line<br />
# and pch = 32 seems to be an invisible point<br />
xyplot( income.pred + income ~ education|type, dd[order(dd$education),], type = 'b',<br />
auto.key = list( columns = 2, lines = T, points = T))<br />
<br />
Also show example using panel.superpose.2<br />
=== Bugs ===<br />
<pre><br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
factor(cut(x, cos, grade, right = FALSE), levels = grade)<br />
}<br />
dg$Grade <- grade( dg$Final )<br />
tab(dg, ~ Grade)<br />
# gets indexing of levels wrong<br />
# the following seems to work correctly<br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
ret <- cut(x, cos, grade, right = FALSE)<br />
factor(ret, levels = grade)<br />
}<br />
</pre> <br />
==== Getting the G matrix in nlme ====<br />
fit <- lme( y ~ x, dd, random = ~1+x |id)<br />
G <- pdMatrix( fit$modelStruct$reStruct)$id<br />
==== Building R packages in 2.14 ====<br />
# Install R<br />
# Install tools: http://robjhyndman.com/researchtips/building-r-packages-for-windows/<br />
<br />
# Info: http://cran.r-project.org/doc/contrib/Graves+DoraiRaj-RPackageDevelopment.pdf<br />
<br />
==== Notes ====<br />
* [http://ipsur.r-forge.r-project.org/book/ IPSUR: Introduction to Probability and Statistics using R]<br />
* [[/test of slash]]<br />
* [[/schedule]]<br />
* [http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=72 Addicted to R Graph Gallery]<br />
* [http://rwiki.sciviews.org/doku.php R Wiki]<br />
* [http://rwiki.sciviews.org/doku.php?id=guides:demos:stata_demo_with_r Stata demo in R]<br />
<br />
== Thumbnail test ==<br />
Here is a graphic file in raw form:<br />
<br />
[[File:UN-missing1.jpg]]<br />
<br />
And here is the same file with a thumbnail:<br />
[[File:UN-missing1.jpg|thumb]]<br />
<br />
== Math check ==<br />
<br />
<font color="red">'''Please click on the 'discussion' tab above'''</font><br />
<br />
:Test how math renders:<br />
<br />
:<math>\begin{align}<br />
f(x) & = (a+b)^2 \\<br />
& = a^2+2ab+b^2 \\<br />
\end{align}</math><br />
<br />
<math> x \perp y </math><br />
<br />
== glmmPQL etc ==<br />
Good discussion between Doug and Ben: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001457.html<br />
== Combining unbiased estimators ==<br />
THis is an example:<br />
* bullet 1<br />
* bullet 2<br />
** again<br />
**: indented <br />
* bullet3<br />
nubmered bullets:<br />
# one<br />
# two<br />
## dkjkdj<br />
##* djkdj<br />
#* djfkd<br />
=== subheading ===<br />
new stuff<br />
==== sub sub ====<br />
more stuff<br />
<br />
<br />
<br />
Let <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> be unbiased estimators of <math>\phi \in {{\mathbb{R}}^{k}}</math> with non-singular variances <math>{{V}_{1}}</math> and <math>{{V}_{2}}</math> respectively.<br />
<br />
Then the minimum variance linear unbiased estimator of<br />
<math>\phi </math> is obtained by combining <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> using weights that are proportional to the inverses of their variances. The result can be expressed in a variety of ways:<br />
<br />
<math>\begin{align}<br />
\hat{\phi } &= {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) \\ <br />
& = {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right)+ & \left[ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}}-{{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}} \right] \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( I+{{V}_{2}}V_{1}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{\left( I+{{V}_{1}}V_{2}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{1}}+{{V}_{1}}V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
The proof is an application of the principle of Generalized Least-Squares. The problem can be formulated as a GLS problem by considering that:<br />
<math>\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]\phi +\left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right]</math> with <math>\operatorname{Var}\left( \left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right] \right)=\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]</math><br />
<br />
Applying the GLS formula yields:<br />
<math>\begin{align}<br />
\hat{\phi } & ={{\left( {{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right] \right)}^{-1}}{{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right] \\ <br />
& ={{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
<br />
== From Nassif Ghoussoub ==<br />
Beware the “useful idiocy” of Mr. Morgan<br />
<br />
The latest commentary of Gwyn Morgan in the Globe and Mail, “If universities were in business, they’d be out of business”, http://www.theglobeandmail.com/report-on-business/commentary/gwyn-morgan/ has crossed another line. Far from being an analysis of the state of Canadian universities, his rant is personal, bitter, demeaning, and insulting to university professors across the country.<br />
<br />
Back in his Globe article of April 29, 2009, “Not all research deserves public funding“, the retired CEO of EnCana Corp. proceeded to rip into the “ivory towers of academia”, attack “esoteric research” and disparage any graduate degree not hailing from medicine or engineering. He also dismissed the 2300 scientists who joined the “Don’t Leave Canada Behind” campaign, which called on the government to include R&D, the lifeblood of the new economy, in its stimulus budget. To its credit, the government of Canada responded positively to the call of its scientists, but from his -dubiously earned- platform at the Globe and Mail, Mr. Morgan kept at it.<br />
<br />
In Saturday’s paper, Mr. Morgan employs sweeping generalizations and ghost statistics to come to the conclusion that, among other things, Canada’s university professors are “poorly prepared” for their lectures, “show up occasionally” to class, and give “poorly thought out assignments”. He claims “the reaction of universities to widespread student dissatisfaction is to blame insufficient financing, rather than their own dysfunction”. He offers that in the new age, formal lectures should be altogether ended.<br />
<br />
His commentary provides neither data about student learning, nor any direct quotations from professors or students. A 1991 study is cited, and then baptized as the truth with a simple "Nineteen years later, little has changed." The article does not attack a particular university, faculty, or teaching method, but rather an apparently archetypal "university professor".<br />
<br />
So what if he hasn't been on campus in 40 years? He knows how it is. Even then, he "stopped going to classes and dedicated his time to learning from textbooks and reviewing friends’ notes". But Mr. Morgan ignores that a professor somewhere, sometime, must have produced and dictated these textbooks and notes. He finds "no reason why all written course material can’t be delivered via the Internet", obviously not aware that since the 90’s, most course material has been made available on the Internet, thanks to dedicated professors. Morgan's suggestion that we replace large classes with "small informal discussions" sounds great, but how does the CEO propose we pay for the much larger number of professors required to do the job? He wants universities to run like businesses, but as one reader suggested: “If Universities were run like the oil and gas industry we would be back in the dark ages where the only skill required would be to count your money... at least until the oil runs out”.<br />
<br />
It is obvious that we embattled post-secondary teachers and researchers need to worry more about the “very useful idiocy” of Mr. Morgan, the permanent platform he has been provided, and the damage that drivel like this can cause to higher education and advanced research in Canada.<br />
<br />
For the Globe and Mail, Mr. Morgan has been pure comic gold for years. Writing on a variety of subjects, ranging from environmental issues and health care, to research and post-secondary education, he has been a bottomless trove of shameless misrepresentations, extreme views and sheer wackiness. But ultimately, this is not only about Gwyn Morgan nor about the Globe and Mail. It is about us.<br />
<br />
It is about Canada’s University Presidents countering his dangerous Tea Party style rhetoric on our post-secondary institutions.<br />
<br />
It is about the Deans of Canada’s Faculties facing up to Mr. Morgan when he writes: “Many qualified applicants are turned away from areas such as engineering and medicine, while universities continue to graduate thousands with knowledge that is neither useful in getting a job, nor in helping our country succeed in a competitive world.”<br />
<br />
It is about the Royal Society of Canada, and other learned societies responding to his views about “esoteric research that doesn’t have the slightest chance of yielding any real value”.<br />
<br />
It is also up to our schools of journalism, to point out to mainstream media the irresponsibility in printing shallow, empty articles full of generalizations and devoid of facts.<br />
<br />
Mr. Morgan may be one of those individuals who get so many things wrong at once that the thought of challenging them or setting the record straight is just too daunting. But it is incumbent upon us not to let his rhetoric negate the exemplary contributions of thousands of Canada’s scholars, teachers and researchers.<br />
<br />
Nassif Ghoussoub, Professor of Mathematics, The University of British Columbia<br />
<br />
== Cell Phones ==<br />
<br />
Date: Sun, 10 Oct 2010 20:02:29 -0400<br />
From: Stuart Newman <newman@NYMC.EDU><br />
Reply-To: Science for the People Discussion List<br />
<SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU><br />
To: SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU<br />
Subject: "Disconnect": Why cellphones may be killing us<br />
<br />
Though I haven't yet read it, this book is presumably not based on<br />
anecdotal evidence. The author, Devra Davis, is the founding director<br />
of the toxicology and environmental studies board at the U.S. National<br />
Academy of Sciences.<br />
<br />
http://tinyurl.com/2fvycxc [Salon.com]<br />
<br />
"Disconnect": Why cellphones may be killing us<br />
A new book probes the connection between mobile devices and a host<br />
of health problems -- with frightening results<br />
By Thomas Rogers<br />
<br />
== Links ==<br />
* [http://www.ted.com/talks/hans_rosling_the_good_news_of_the_decade.html?utm_source=newsletter_weekly_2010-10-12&utm_campaign=newsletter_weekly&utm_medium=email 2010 TED talk by Hans Rosling]<br />
* [http://en.wikipedia.org/wiki/Apophenia Apophenia]<br />
* [http://en.wikipedia.org/wiki/Pareidolia Pareidolia]<br />
<br />
== Notes on mediation ==<br />
The question of mediation is essentially a question about causality. Is the putative mediator, M say, caused by X and, in turn, a cause of Y? But M, in a mediational analysis, cannot have been randomized even if X has been. The question of mediation is essentially a question about causality with observational, not experimental, data.<br />
<br />
To get a perspective on the problem we need to start by considering the general problem of causality with observational data. Let Y be the response variable and let X be the 'target' variable which is seen as a possible 'cause' of Y. For X to cause Y means that the expected value of Y would change in some target experimental condition in which X was manipulated (perhaps through random allocation) while other variables were left untouched -- not necessarily unchanged.<br />
<br />
For causal inference with observational data, we are interested in what would happen under circumstances that are different from those we have actually observed. Our analysis of our observational data will yield an accurate estimate of the causal effect of X if the model for the observational data has the same coefficient for X as it would have if it were applied to data gathered under the target experimental condition. The challenge is to specify and estimate a model that is 'transferable' from the observational condition to the experimental condition. We need a set of concepts to help us critically assess whether a model is transferable. It is not sufficient to have a model that 'fits' well. It may be necessary to include potential confounding factors even if they are not significant in the prediction model for Y. And it may be necessary to exclude strong predictors that are potential mediators -- variables that must not be held constant as one examines the causal relationship between X and Y. One needs a good understanding of the causal model that is valid under experimental conditions in order to properly specify a transferable observational model.<br />
<br />
The problem can be approached in a surprisingly different way, which is the basis for propensity scores. Instead of focusing on a 'transferable' model for Y, one focuses on a model for the assignment of the target causal variable X using potential confounding variables. As in models for Y, it is important to avoid potential mediators between X and Y. However, the model for X based on confounding factors is a prediction model. Confounding variables may be included, raw or transformed, as long as they are predictive of X. It is not necessary to include variables that are not predictive of X. The criterion for developing the model is statistical fit, a criterion that -- apart from the actual selection of confounding predictors -- is empirical, i.e. it is based on the analysis of the data at hand without reference to external theory that is not verifiable with the data. The assignment model need only be valid for the observational condition. Its validity for the experimental condition is irrelevant.<br />
<br />
What are some of the pros and cons of the two approaches? A good transferable model for Y may provide more precise estimates of the effect of X because more of the variability in Y is accounted for in the model. On the other hand, the validity of a causal estimate based on the propensity score approach depends on assumptions that may be much easier to sustain than those required for the approach based on modeling Y. Broadly, the propensity score approach offers lower bias but not necessarily lower variability. Note that the two approaches are not mutually exclusive. They may be better viewed as two sets of concepts that could be combined in an analysis that draws from both.<br />
<br />
How does this all relate to the analysis of mediation? The Baron and Kenny approach and its variants -- in which I include the various ways of estimating direct and indirect causal effects -- are all based on methods analogous to models for Y. As mentioned earlier, estimating the causal effect of the mediator involves causal inference with observational data -- even in the context of an experiment randomizing X. This invites the question whether propensity score methods could be used in assessing more accurately the causal effect of M. The answer lies in the relatively recent theory of principal stratification [Constantine Frangakis and Donald Rubin (2002) "Principal Stratification in Causal Inference", ''Biometrics'', '''58''', 21--29].<br />
<br />
An accessible reference for the concepts behind propensity scores is Donald Rubin (1997) "Estimating Causal Effects from Large Data Sets Using Propensity Scores," ''Annals of Internal Medicine'', '''127''', 757--763.<br />
<br />
A recent treatment of mediation using principal stratification is given in Chapter 8, "Intermediate Causal Factors," of Herbert Weisberg (2010) ''Bias and Causation: Models and Judgment for Valid Comparisons'', Wiley.<br />
<br />
With the large number of seemingly competing approaches to causal inference, students as well as experienced researchers may feel quite puzzled as to which approach they should use. The answer, possibly, is all of them. Each approach seems to shed light on some aspect of the challenge of causal inference in the absence of pristine randomization. They do not offer recipes so much as sets of concepts that can be applied to help understand research projects and analyses.<br />
<br />
== Shock of the New ==<br />
[http://en.wikipedia.org/wiki/Robert_Hughes_(critic) Robert Hughes (1980)]<br />
* [http://www.youtube.com/watch?v=GFn4UmkBcaQ Surrealism Part 1]<br />
* [http://www.youtube.com/watch?v=2uaA8CfZKRs Surrealism Part 2]<br />
* [http://www.youtube.com/watch?v=dActaAa-teM Surrealism Part 3]<br />
* [http://www.youtube.com/watch?v=ZLmCh0xw4h0 Surrealism Part 4]<br />
* [http://www.youtube.com/watch?v=s-SsWPNNBC4 Surrealism Part 5]<br />
== York AODA ==<br />
* [http://aoda.yorku.ca/cs/interactive/en/# AODA training]<br />
== Notes for NATS 1500 ==<br />
* Topics<br />
** single-sex schools?<br />
<br />
== Notes for MATH 6627 ==<br />
* [http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225 Collection of misleading graphs]<br />
* [http://www.amstat.org/sections/cnsl/BooksJournals.cfm ASA consulting page]<br />
* set up student home page<br />
* first assignment. Find and explore a dataset using<br />
** Ernest Kwan's correlagram<br />
** Lattice (use panels and groups)<br />
** p3d<br />
** gapminder<br />
** should have included candisc<br />
* present a 15-minute(crucial) presentation on the data set and on the method<br />
* prepare a wiki page with links and materials <br />
* Address a few questions:<br />
** What are strengths and weaknesses<br />
** For what kind of dataset is it well suited and what kind not?<br />
** Can you find a dataset that illustrates well the features of this approach?<br />
** Can you compare your approach with other approaches?<br />
[[/test page|test]]<br />
<br />
* develop checklists:<br />
* initial exploration of data<br />
* missing data (explicit and implicit)<br />
* do simulation of parallel methods: check estimation of variance parameters<br />
* use nlme to estimate knot placement in gsp<br />
=== Links ===<br />
* [http://healthland.time.com/2011/09/02/mind-reading-why-bad-math-can-ruin-your-health/ Why bad math can ruin your health]<br />
* [http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html TED talk: David McCandless, The Beauty of Data Visualization]<br />
* Hans Rosling<br />
* Great intro to Gapminder: http://www.mrbartonmaths.com/gapminder.htm<br />
<br />
* [http://scs.math.yorku.ca/index.php?title=Special:Contributions&dir=prev&contribs=user&target=Georges Contributions]<br />
* [http://www.scribd.com/doc/17378132/The-Fallacy-of-Personal-Validation-a-Classroom-Demonstration-of-Gullibility The Forer Effect - a classroom example]<br />
<br />
== Notes for R course ==<br />
* Start: It had to be U ... on the SVD [http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I]<br />
* Use SPSS dates both ways to illustrate <br />
** sub using regular expressions<br />
** import: reading dates into 'Date' format using formats: Include all %a %b %Y and others?<br />
** export: writing a date into a character string using format( Date.object, "%d-%m-%Y") to create variable SPSS can read<br />
* Variable references<br />
*: deal with plethora of ways used differently in different places: <br />
*:* formula ( ~id ), good for variables in different roles ( y ~ log(x) + x2 | id)<br />
*:* interpreted in data: (id), good for single var but can use list : (list(x1,x2))<br />
*:*:Examples:<br />
*:*:* <br />
*:* fully reference: dd$id <br />
*Beware:<br />
* aggregate with a formula drops rows with NAs even though the FUN might be able to handle them<br />
* multiple barplot: http://rtricks.wordpress.com/2009/10/26/multbar-advanced-multiple-barplot-with-sem/<br />
=== Add ===<br />
* Discussion of memory issues: what happens when you work on two computers<br />
=== Links ===<br />
* [http://www.r-bloggers.com/r-popularity-%E2%80%93-steady-growth-and-new-york-times/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R blog]<br />
* John Fox: ICPSR: 2010: [http://socserv.mcmaster.ca/jfox/Courses/R-course/Slides-handout.pdf Overview including slides on building R packages]<br />
* [http://www.icpsr.umich.edu/files/sumprog/biblio/2010/Fox.pdf Introduction to the R computing environment]<br />
* ICPSR 2011:<br />
** [http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/index.html Overall page]<br />
**<br />
<br />
== Notes for High School Talks ==<br />
=== Climate change ===<br />
* http://www.cbc.ca/news/technology/story/2011/09/09/pol-climate-adaptation.html<br />
<br />
== Excel techniques ==<br />
* Regular expressions and string substitution<br />
**<br />
* [http://www.contextures.com/xldataval08.html]<br />
<br />
== Ellipse Seminar ==<br />
[[/Ellipse Seminar]]<br />
== Setting up mathstat email in Thunderbird ==<br />
IMAP mailserver: mathstat.yorku.ca Port: 143 Security: STARTTLS<br />
<br />
Outgoing: mathstat.yorku.ca Port: 587 Security: none?<br />
<br />
== Statistical amusement ==<br />
* Statistical Song channel on youtube: http://www.youtube.com/user/StatisticalSongs<br />
** It had to be U ... on the SVD: http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I<br />
** It don't mean a thing if you don't do modelling: http://www.youtube.com/user/StatisticalSongs#p/u/0/Jzm2hrEfNdY<br />
== On Careers in Statistics and Mathematics==<br />
*[https://sites.google.com/site/statsr4us/intro/2-the-joy-of-stats/statsfuture The Joy of Statistics]<br />
*[http://www.bbc.co.uk/news/business-14631547 How Mathematicians Rule the Markets: Quant Trading]<br />
<br />
== On Teaching Science ==<br />
<br />
* [http://www.nats.yorku.ca/index.shtml Natural Science Web Site]<br />
* [https://sites.google.com/site/changingourteaching/Home/tips-for-teaching-asistants Collection of links on teaching]<br />
<br />
<br />
=== A few videos ===<br />
* [http://www.youtube.com/watch?v=ccReLF6M62Y Why Teach Science by James Randi]<br />
* [http://www.youtube.com/watch?v=BlpyGhABXRA Teaching Introductory Physics]<br />
* [http://www.ted.com/talks/brian_goldman_doctors_make_mistakes_can_we_talk_about_that.html?utm_source=newsletter_weekly_2012-01-25&utm_campaign=newsletter_weekly&utm_medium=email Brian Goldman on learning from mistakes]<br />
<br />
== LOG ==<br />
* [[/a]]</div>Georgeshttp://scs.math.yorku.ca/index.php/User:GeorgesUser:Georges2019-03-18T04:25:59Z<p>Georges: </p>
<hr />
<div>__TOC__<br />
== Notes on Mixed Models ==<br />
* Look at influence in car: uses dropone with parallelization and few iterations<br />
* LR tests assumes log-likelihood is quadratic on 'some' transformation of parameter, Wald assumes quadratic on the scale of parameter tested.<br />
* varPower(form = ~ fitted(.), fixed = 1)<br />
** The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response.<br />
** the default level for 'fitted' is the finest level. <br />
** The variance function must be expressed in terms of the expected value of the re-expressed response.<br />
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf<br />
* <opml><body><outline text="Microsoft PowerPoint - Slides" _note="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf " /></body></opml> <br />
== Paradoxes, Fallacies and Other Surprises ==<br />
[[Paradoxes, Fallacies and Other Surprises]]<br />
== Bayes ==<br />
* Two consecutive issues of Statistical Science in 2011 have many interesting article that are related to Bayesian inference:<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059127<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059971<br />
* Experimenting with files:<br />
** jpg file<br />
*** Using the wiki link to the uploaded name: [[File:2013-12-29 18.20.34.jpg|thumb]]<br />
*** Using the wiki link to the uploaded name as media: [[Media:2013-12-29 18.20.34.jpg]]<br />
** .R file<br />
*** FIle wiki link [[File:Tcells.R]]<br />
*** Media wiki link [[Media:Tcells.R]]<br />
* [[Useful formulas]]<br />
* SEM with STAN<br />
** [https://groups.google.com/forum/#!topic/stan-users/dVjm8iES54k Forum discussion re slow convergence]<br />
* Interaction fallacy in a presentation:<br />
*:If you think two variable affect each other then you should include an interaction between them. (Fooled by the word 'interaction').<br />
*[http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf Gelman and Robert on Bayes]<br />
*[http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines MARS: Multivariate Adaptive Regression Splines]<br />
*[http://cran.r-project.org/web/packages/mosaic/vignettes/V2StartTeaching.pdf Teaching with R using MOSAIC by ... and D. Kaplan]<br />
*[http://vudlab.com/simpsons/ Causality: interactive app illustrating Simpson's Paradox]<br />
*[http://www.metafor-project.org/doku.php/tips:testing_factors_lincoms metafor: Tutorial using mixed models for meta analysis]<br />
* [http://andrewgelman.com/2006/06/11/survey_weights/ Andrew Gelman on survey weights with multilevel models]: he suggests unweighted modeling (or a 'variance weighted' analysis, e.g. replication weights) followed by poststratification. <br />
* [https://stat.duke.edu/courses/Fall11/sta101.02/labs/lab1.pdf Intro to R and Rstudio in an intro course at Duke]<br />
* [http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf Intro to R in RStudio]<br />
* [http://www.crcpress.com/product/isbn/9781466515857 Multilevel Modeling Using R]<br />
* [http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf Multilevel Modeling in R by Paul Bliese]<br />
* [http://blog.revolutionanalytics.com/2014/08/statistics-losing-ground-to-cs-losing-image-among-students.html Losing ground to CS?]<br />
* [https://www.youtube.com/watch?v=Is1Ej0Vj0Mw Interview with David Smith at UseR 2014]<br />
* [http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#mapping-variable-values-to-colors On colour]<br />
* [http://cran.r-project.org/web/packages/plot3D/vignettes/plot3D.pdf 3d plotting packages]<br />
* [http://blogs.sas.com/content/iml/2014/08/05/stiglers-seven-pillars-of-statistical-wisdom/ Stigler's Seven Pillars of Statistics]<br />
* [http://glmm.wikidot.com/faq#modelspec FAQ on GLMMs]<br />
* [http://www.math.uah.edu/stat/ Virtual Labs in Probability and Statistics]<br />
* [http://tryr.codeschool.com/ R code school with O'Reilly]<br />
* [http://cran.r-project.org/web/packages/pastecs/pastecs.pdf pastecs package for time series]<br />
* [http://www.bbc.com/news/magazine-28166019 Do doctors understand test results? By William Kremer -- about Gerd Gigerenzer]<br />
* [[/Multiple Testing -- a comment]]<br />
* [[/MOOCs for Data Science]]<br />
* [http://artssquared.wordpress.com/2012/03/21/letter-to-the-provost-and-vp-academic-professor-carl-amrhein/ Arts Squared]<br />
* [[/Using R Markdown]]<br />
* [[/Statistics Links for Courses]]<br />
* [[/Lee Lorch]]<br />
* [http://arxiv.org/pdf/1402.1894v1.pdf Baumer et al. (2014) Using R Markdown in Intro Stats]<br />
* [[/Job ratings]]<br />
* [http://www.stat.cmu.edu/~hseltman/PIER/Bayes/data/schools.R Using WinBUGS on the Netherlands data]<br />
* [[/Climate Change|/Climate Change]]<br />
* [[/Standardize or Not]]<br />
*[[/Mixed Models -- papers]]<br />
*[[/MCMCglmm]]<br />
*[[/Wiki tests]]<br />
<br />
== Cause, correlation, or ... ==<br />
*[http://p.nytimes.com/email/re?location=4z5Q7LhI+KUPcT7snurzN09anQA2MM49IhNWGFarU5GcvOIXzFz0cSuazUKJK97uTp6+uuRfTEhO+dnMZordtiC8Du17IY2zzXWzY7etdKdkq0H3sffQdSh+6YpVRsJcs8ZeVrfzShCpIEB5wXVC1g==&campaign_id=23&instance_id=45715&segment_id=62874&user_id=ecb65bd8a646c4ab6214f51d21246fce&regi_id=7495274 Instant noodles and metabolic syndrome]<br />
== Notepad ==<br />
* [http://www.r-bloggers.com/some-r-resources-for-glms/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R resources for GLMs]<br />
* [https://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture17.pdf On Mosaic plots]<br />
* [[/Academic and Administrative Program Review]]<br />
* [[/Statistics programs]]<br />
* [http://www.yorku.ca/careers/gpse/2013/ Careers Expo at York] with 13 booths from UofT such as Dalla Lana but only one generic FGS booth from York.<br />
* [https://www.researchgate.net/publication/257516014_One_paradox_in_statistical_decision_making#! Schervish's p-value paradox]<br />
* [[/Data]]<br />
* [http://www.universityaffairs.ca/course-evaluations-the-good-the-bad-and-the-ugly.aspx Course evaluations: the good, the bad and the ugly]<br />
* Larry Wasserman on<br />
** [http://normaldeviate.wordpress.com/2013/04/27/the-perils-of-hypothesis-testing-again/ Perils of Hypothesis Testing]<br />
**[http://normaldeviate.wordpress.com/2013/04/13/data-science-the-end-of-statistics/ Data Science]<br />
**[http://normaldeviate.wordpress.com/2012/06/18/48/ Causality]<br />
*[[/HOA]]<br />
*[[/Big data]]<br />
*[[/Student satisfaction]]<br />
*[http://normaldeviate.wordpress.com/2012/12/21/guest-post-rob-tibshirani/ Rob Tibshirani's list of 9 great statistics papers]<br />
*[http://www.newyorker.com/online/blogs/johncassidy/2013/04/the-rogoff-and-reinhart-controversy-a-summing-up.html?mobify=0 Cassidy on the Reinhart-Rogoff controversy.]<br />
*[http://clinicaltrials.gov/ Clinical Trials registry in the US]<br />
*[http://en.wikipedia.org/wiki/Cochrane_Collaboration The Cochrane Collection]<br />
* 2004 ICMJE: policy of registration:<br />
<br />
__TOC__<br />
Recommended sources on statistics:<br />
<br />
There are many excellent sources for information on current statistical issues (Psychonomic Society Journals):<br />
<br />
* Confidence Intervals:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci).<br />
** Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 57, 203-220. doi:10.1037/h0087426<br />
* Effect Size Estimates:<br />
** Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis and the interpretation of research results. Cambridge University Press. ISBN 978-0-521-14246-5.<br />
** Fritz, C. O., Morris, P. E., & Richler, J. J. (2011). Effect size estimates: Current use, calculations and interpretation. Journal of Experimental Psychology: General, 141, 2-18.<br />
** Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York, NY: Routledge/Taylor & Francis Group.<br />
* Meta-analysis:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY US: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci ).<br />
** Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis. New York: Oxford University Press.<br />
* Bayesian Data Analysis:<br />
** Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. San Diego, CA: Elsevier Academic Press. (See www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/)<br />
** Kruschke, J. K. (in press). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. (For a preprint see http://www.indiana.edu/~kruschke/BEST/BEST.pdf).<br />
* Power Analysis:<br />
** Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. (See http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/)<br />
=== Blogs ===<br />
* [http://jeromyanglim.blogspot.ca Jeromy Anglim]<br />
<br />
== Pythagoras Diagram ==<br />
* [http://pages.uoregon.edu/stevensj/MRA/partial.pdf Venn diagram 'fallacy' example]<br />
<br />
==Recent changes==<br />
[[/Links|Links]]<br><br />
[[/Recent Changes]] [[/Contributions]]<br><br />
[[/DO]]<br />
==Topics==<br />
* [[/R packages]]<br />
* [[/Curriculum]]<br />
* [[/HLM links]]<br />
* [[/Education links]]<br />
* Death of Evidence<br />
** [http://www.nature.com/nature/journal/v483/n7387/full/483006a.html Article in Nature: Frozen Out, March 1, 2012]<br />
** [http://www.deathofevidence.ca/ Death of Evidence website]<br />
* [[/Mixed effects for multinomial responses]]<br />
* [[/Ellipse paper comments]]<br />
* On Tobacco (from Matt)<br />
** [http://io9.com/5899612/low+income-countries-are-a-cigarettes-best-friend Low income countries and tobacco]<br />
** [http://www.tobaccoatlas.org/uploads/Images/PDFs/Tobacco_Atlas_4_entire.pdf The Tobacco Atlas]<br />
*[[/fda.R]]<br />
*[[/FSE Scholars evening]]<br />
*[[/MATH 6627 student contributions]] <br />
:[http://scs.math.yorku.ca/index.php?title=Special:UserLogin&type=signup Create new account]<br />
:[http://scs.math.yorku.ca/index.php/SCS_2011:_Statistical_Analysis_and_Programming_with_R SCS R course]<br />
:[[/R packages]]<br />
:[[/SPIDA 20102 preparation]]<br />
__TOC__<br />
== Data scraping ==<br />
* [http://www.r-bloggers.com/preparing-public-data-for-analysis-with-r/ Example from Ministry of Transportation]<br />
== RStudio: Shiny ==<br />
* [http://www.premiersoccerstats.com/wordpress/?p=1273 On Shiny]<br />
* [http://demo.rapporter.net/?sport=ATH-170&weight=0 Rapporter.net]<br />
<br />
== Notes for 6643 ==<br />
* Assignment: Can we produce an estimate of AIC based just on the Wald test?<br />
<br />
== On Pedagogy ==<br />
* [http://www.guardian.co.uk/higher-education-network/blog/2012/oct/18/social-sciences-quantative-skills-training On the importance of quantitative skills in social science]<br />
* http://www.matstat.com/teach/<br />
* [http://www.ma.utexas.edu/users/mks/statmistakes/StatisticsMistakes.html Common Misteaks in Statistics]<br />
=== Advice for students ===<br />
* [http://www.universityaffairs.ca/how-to-ask-for-a-reference-letter.aspx How to ask for a reference letter]<br />
<br />
== Questions (e.g. for survey papers) ==<br />
* Implement more diagnostics in R for lme models<br />
* Explore duality of the whole data matrix<br />
* Extend the UD representation to hyperbola, etc., and include a way of plotting osculation loci<br />
* Explore the geometry of harmonic combinations and its implications for mixed model estimates. What happens as you shift weight from G to <math>(X'X)^{-1}</math>? How does the result wander outside the convex combination? When does it happen and what does it mean?<br />
* Refine Lform and related tools<br />
<br />
== Read ==<br />
* [http://www.ams.org/notices/201001/rtx100100030p.pdf Music: Broken Symmetry,<br />
Geometry, and Complexity]<br />
* https://sites.google.com/site/r4statistics/<br />
<br />
== R course ==<br />
* [https://github.com/hadley/devtools/wiki/ Hadley Wickam Advanced R Development]<br />
* [http://courses.had.co.nz/11-devtools/ Hadley Wickam's R development courses]<br />
Day 2 - add<br />
* final recap of 'lm' interface: subset, na.action, etc., etc.<br />
* discuss formula syntax<br />
* final recap of methods for 'lm'<br />
* note easy extension to 'glm', 'lme', etc.<br />
* note that many 'new' functions do not use this interface, only more 'mature' functions<br />
** lm.formula<br />
* discuss OO showing methods and dispatching<br />
Day 3<br />
* the most useful tools:<br />
** seq<br />
** rep<br />
** replacement functions<br />
* data input<br />
* more programming<br />
** object oriented programming<br />
** using a function in C<br />
* using attributes<br />
* systematic treatment of graphics, including<br />
** par<br />
** xyplot<br />
[[/Day2 Guided Tour of Linear Models.R]]<br />
<br />
== SCS Reads 2011 Links ==<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-I-Simple-v2.pdf Notes on visualizing simple regression]<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-II.r R script for multiple regression]<br />
== Capstone courses ==<br />
* [http://www.amstat.org/publications/jse/v9n1/spurrier.html Course at the U. of South Carolina, 2001]<br />
* <br />
== Links to recent courses ==<br />
*[http://scs.math.yorku.ca/index.php/SCS_2011:_Mixed_Models_with_R SCS 2011: Mixed Models with R]<br />
*[http://statswiki.math.yorku.ca/index.php/SCS:_Mixed_Models_in_R SCS 2010]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2008-09_Practicum_in_Statistical_Consulting MATH 6627 2008-09]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting MATH 6627 2010-11]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2010:_Mixed_Models_with_R SPIDA 2010]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2009:_Mixed_Models_with_R SPIDA 2009]<br />
<br />
*[http://scs.math.yorku.ca/index.php/Spida Spida package]<br />
<br />
== Links to add somewhere ==<br />
* [http://www.ted.com/talks/ben_goldacre_battling_bad_science.html?utm_source=newsletter_weekly_2011-10-04&utm_campaign=newsletter_weekly&utm_medium=email Battling bad science]<br />
*D W Hosmer, S Taber and S Lemeshow () "The importance of assessing the fit of logistic regression models: a case study." ''American Journal of Public Health'', Vol. 81, Issue 12 1630-1635<br />
* [http://www.statmethods.net/ Quick R] for SPSS, SAS and Stata users.<br />
=== Graphics ===<br />
* [http://www.edwardtufte.com/tufte/ ET Modern]<br />
* Striking graphics:<br />
** [http://en.wikipedia.org/wiki/File:U.S._incarceration_rates_1925_onwards.png Incarceration rate in the United States]<br />
<br />
=== Matrices ===<br />
*[http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html Matrix Reference Manual]<br />
*[http://en.wikipedia.org/wiki/Matrix_determinant_lemma Matrix determinant lemma]<br />
*[http://en.wikipedia.org/wiki/Woodbury_matrix_identity Woodbury Matrix Identity]<br />
<br />
=== Simpson's Paradox ===<br />
In the [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979 1979 Canadian federal election] <br />
an unusual event occurred in the Northwest Territories: the Liberals won the popular vote in the territory, but won [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979#National_results neither seat.]<br />
<br />
=== Lee Lorch ===<br />
* [http://aer.sagepub.com/content/36/4/739.abstract Marybeth Gasman (1999)] "Scylla and Charybdis: Navigating the Waters of Academic Freedom at Fisk University During Charles S. Johnson's Administration (1946–1956)" ''American Educational Research Journal''<br />
*: A prominent sociologist and race relations activist, Charles S. Johnson dedicated his life to the advancement of Blacks. His presidency at Fisk University, a historically Black college, was the culmination of his career. During the latter part of his administration, he faced a dilemma involving an outspoken professor named Lee Lorch, who, in 1954, was accused of being a communist. Johnson and the Board of Trustees dismissed Lorch because he refused to answer a congressional committee's questions about his previous political affiliations. In 1959, the American Association of University Professors found the late President Johnson guilty of violating the principles of academic freedom. This article explores the ways in which academic freedom, civil liberties, and civil rights clashed in the Lee Lorch case. Furthermore, it examines the ways in which the setting of a historically Black college alters traditional assumptions about the application of these principles.<br />
* [http://www.nytimes.com/2010/11/22/nyregion/22stuyvesant.html Charles V. Bagli (November 21, 2010), "A New Light on a Fight to Integrate Stuyvesant Town", ''New York Times''.<br />
<br />
== Multilevel Models ==<br />
<br />
=== Expository ===<br />
* [http://glmm.wikidot.com/faq R FAQ]<br />
*[https://perswww.kuleuven.be/~u0018341/documents/ldafda.pdf Verbeke and Molenberghs (2005): Longitudinal Data Analysis Notes]<br />
=== Missing Data ===<br />
* [http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf King et al. (2012) Amelia II]<br />
*<br />
<br />
=== Evaluation ===<br />
* Green, M.J., Medley G.F., & Browne, W.J. (2009). Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression. Veterinary Research, 40(4):30.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675184/pdf/vetres-40-30.pdf<br />
<br />
=== Software for multilevel models ===<br />
{| class="wikitable"<br />
|-<br />
! Package<br />
! Function<br />
! Notes<br />
|-<br />
| R<br />
clmm {ordinal}<br />
| Ordinal response: Fits cumulative link mixed models, i.e. cumulative link models with random effects via the Laplace approximation or the standard and the adaptive Gauss-Hermite quadrature approximation. The functionality in clm is also implemented here. Currently only a single random term is allowed in the location-part of the model.<br />
| <br />
|-<br />
| R: {lme4a}<br />
| Development version of lme4<br />
Download: svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4<br />
<br />
|<br />
|-<br />
|R: {MCMCglmm}<br />
|MCMC Methods for Multi-response Generalized Linear Mixed Models<br />
| <br />
|-<br />
|R: {plm}<br />
|Econometric Analysis of Panel Survey Data<br />
|[http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf Vignette]<br><br />
See p. 3 for comments on first-differencing.<br />
|-<br />
|<br />
|See Snijders and Bosker (2012) for longer list<br />
|<br />
|-<br />
|R: {lme4:nlmm}<br />
|Mon-linear models with lme4<br />
|[http://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/6NLMMH.pdf Presentation by Doug Bates]<br><br />
|}<br />
<br />
== Clones ==<br />
Check for changes and reconcile<br />
* Lab 1<br />
** [[SCS_2011:_Mixed_Models_with_R/Lab_1]]<br />
** [[MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Lab_1]]<br />
<br />
== Read ==<br />
On the age-period-cohort problem:<br />
* see bibliography by Yang: http://home.uchicago.edu/~yangy/research.html<br />
<br />
== Do ==<br />
*[[/spida|spida to do list]]<br />
*[[/p3d|p3d to do list]]<br />
== Read ==<br />
* [http://www.stat.berkeley.edu/~freedman/ Links to recent papers by David Freedman]<br />
* Links to material by Chris Wild:<br />
** http://www.stat.auckland.ac.nz/~wild/StatThink/<br />
** http://www.stat.auckland.ac.nz/showperson.php?uid=wild<br />
* [http://www3.hku.hk/statistics/staff/kaing/ Kai Ng's converse]<br />
<br />
== Notes ==<br />
* [http://www.chrp.org/love/ASACleveland2003Propensity.pdf Good presentation on use of propensity scores]<br />
* [http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/index.html?eref=sihp Simpson's Paradox]<br />
<br />
== R notes ==<br />
* [http://strimmerlab.org/notes/fdr.html False Discovery Rates in R]<br />
=== Items to cover ===<br />
* Wrap up language:<br />
** Selection (give context): indices: index, names, logical, matrix of coordinates, 'subset'<br />
*** Example: dropping NAs from selected variables. Necessary because functions that are most sophisticated methodologically are generally least sophisticated in their interface<br />
**** contrast sophisticated program: lm with unsophisticated lowess<br />
* Using variables in data frames: <br />
** formula oriented functions: xyplot( y ~ x, data = dd )<br />
** explicit: plot( dd$x, dd$y )<br />
** with: with( dd, plot(x,y)); with(dd, xyplot( y ~ x, dd)<br />
** attach: As usual the easiest is deprecated! (why is it only easy and pleasurable things that are ever deprecated)<br />
**: <tt> attach(dd) </tt><br />
**: <tt> plot(x, y) </tt><br />
**: <tt> detach(dd) </tt><br />
*** Problem with 'attach': <br />
**** names in data frame may be masked by names in workspace<br />
**** assignments in workspace not saved in data frame<br />
* Overview of graphics<br />
** Link to http://addictedtor.free.fr/graphiques/<br />
* Programming structures<br />
* Add to graphics:<br />
*:Colours: <tt>pal(grepv('red',colors())); pals() # for all</tt><br />
*:modified tablemissing<br />
==== debugging in R ====<br />
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/debug.shtml<br />
<br />
=== Links ===<br />
*[http://www.stats.ox.ac.uk/pub/MASS4/ MASS 4th ed.] [http://www.stats.ox.ac.uk/pub/MASS4/#Exercises Exercises]<br />
<br />
=== Importing files ===<br />
==== From Excel ====<br />
* Easy: save file in Excel as .csv, then read into R with read.csv<br />
* If you have a lot of files, or get the files from some other sources that edits .xls or .xlsx files:<br />
* The winner: package gdata: <br />
** First install perl. <br />
** read.xls in gdata handles both .xls and .xlsx files<br />
** works on both 32-bit and 64-bit machines<br />
* package XLConnect seems to work only on xlsx files <br />
* the smaller xlsx package also works only xlsx files<br />
* Package xlsReadWrite works on xls files but only on 32-bit systems<br />
* Use xls2csv, a Perl script to convert files to csv first.<br />
<br />
=== Getting lines vs points for different groups in xyplot ===<br />
Ideally, type = c('l','p') would work but it doesn't seem to. So one way is to use type = 'b' with an invisible line for one group and an invisible point for the other:<br />
<br />
library(spida.beta) # also loads 'car'<br />
dd <- Prestige<br />
dd$income.pred <- predict( lm( income ~ education*type, dd), newdata = dd)<br />
td( lty = c(1,0), pch = c(32, 16), lwd = 2) <br />
# lty = 0 produces an invisible line<br />
# and pch = 32 seems to be an invisible point<br />
xyplot( income.pred + income ~ education|type, dd[order(dd$education),], type = 'b',<br />
auto.key = list( columns = 2, lines = T, points = T))<br />
<br />
Also show example using panel.superpose.2<br />
=== Bugs ===<br />
<pre><br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
factor(cut(x, cos, grade, right = FALSE), levels = grade)<br />
}<br />
dg$Grade <- grade( dg$Final )<br />
tab(dg, ~ Grade)<br />
# gets indexing of levels wrong<br />
# the following seems to work correctly<br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
ret <- cut(x, cos, grade, right = FALSE)<br />
factor(ret, levels = grade)<br />
}<br />
</pre> <br />
==== Getting the G matrix in nlme ====<br />
fit <- lme( y ~ x, dd, random = ~1+x |id)<br />
G <- pdMatrix( fit$modelStruct$reStruct)$id<br />
==== Building R packages in 2.14 ====<br />
# Install R<br />
# Install tools: http://robjhyndman.com/researchtips/building-r-packages-for-windows/<br />
<br />
# Info: http://cran.r-project.org/doc/contrib/Graves+DoraiRaj-RPackageDevelopment.pdf<br />
<br />
==== Notes ====<br />
* [http://ipsur.r-forge.r-project.org/book/ IPSUR: Introduction to Probability and Statistics using R]<br />
* [[/test of slash]]<br />
* [[/schedule]]<br />
* [http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=72 Addicted to R Graph Gallery]<br />
* [http://rwiki.sciviews.org/doku.php R Wiki]<br />
* [http://rwiki.sciviews.org/doku.php?id=guides:demos:stata_demo_with_r Stata demo in R]<br />
<br />
== Thumbnail test ==<br />
Here is a graphic file in raw form:<br />
<br />
[[File:UN-missing1.jpg]]<br />
<br />
And here is the same file with a thumbnail:<br />
[[File:UN-missing1.jpg|thumb]]<br />
<br />
== Math check ==<br />
<br />
<font color="red">'''Please click on the 'discussion' tab above'''</font><br />
<br />
:Test how math renders:<br />
<br />
:<math>\begin{align}<br />
f(x) & = (a+b)^2 \\<br />
& = a^2+2ab+b^2 \\<br />
\end{align}</math><br />
<br />
<math> x \perp y </math><br />
<br />
== glmmPQL etc ==<br />
Good discussion between Doug and Ben: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001457.html<br />
== Combining unbiased estimators ==<br />
THis is an example:<br />
* bullet 1<br />
* bullet 2<br />
** again<br />
**: indented <br />
* bullet3<br />
nubmered bullets:<br />
# one<br />
# two<br />
## dkjkdj<br />
##* djkdj<br />
#* djfkd<br />
=== subheading ===<br />
new stuff<br />
==== sub sub ====<br />
more stuff<br />
<br />
<br />
<br />
Let <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> be unbiased estimators of <math>\phi \in {{\mathbb{R}}^{k}}</math> with non-singular variances <math>{{V}_{1}}</math> and <math>{{V}_{2}}</math> respectively.<br />
<br />
Then the minimum variance linear unbiased estimator of<br />
<math>\phi </math> is obtained by combining <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> using weights that are proportional to the inverses of their variances. The result can be expressed in a variety of ways:<br />
<br />
<math>\begin{align}<br />
\hat{\phi } &= {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) \\ <br />
& = {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right)+ & \left[ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}}-{{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}} \right] \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( I+{{V}_{2}}V_{1}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{\left( I+{{V}_{1}}V_{2}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{1}}+{{V}_{1}}V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
The proof is an application of the principle of Generalized Least-Squares. The problem can be formulated as a GLS problem by considering that:<br />
<math>\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]\phi +\left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right]</math> with <math>\operatorname{Var}\left( \left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right] \right)=\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]</math><br />
<br />
Applying the GLS formula yields:<br />
<math>\begin{align}<br />
\hat{\phi } & ={{\left( {{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right] \right)}^{-1}}{{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right] \\ <br />
& ={{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
<br />
== From Nassif Ghoussoub ==<br />
Beware the “useful idiocy” of Mr. Morgan<br />
<br />
The latest commentary of Gwyn Morgan in the Globe and Mail, “If universities were in business, they’d be out of business”, http://www.theglobeandmail.com/report-on-business/commentary/gwyn-morgan/ has crossed another line. Far from being an analysis of the state of Canadian universities, his rant is personal, bitter, demeaning, and insulting to university professors across the country.<br />
<br />
Back in his Globe article of April 29, 2009, “Not all research deserves public funding“, the retired CEO of EnCana Corp. proceeded to rip into the “ivory towers of academia”, attack “esoteric research” and disparage any graduate degree not hailing from medicine or engineering. He also dismissed the 2300 scientists who joined the “Don’t Leave Canada Behind” campaign, which called on the government to include R&D, the lifeblood of the new economy, in its stimulus budget. To its credit, the government of Canada responded positively to the call of its scientists, but from his -dubiously earned- platform at the Globe and Mail, Mr. Morgan kept at it.<br />
<br />
In Saturday’s paper, Mr. Morgan employs sweeping generalizations and ghost statistics to come to the conclusion that, among other things, Canada’s university professors are “poorly prepared” for their lectures, “show up occasionally” to class, and give “poorly thought out assignments”. He claims “the reaction of universities to widespread student dissatisfaction is to blame insufficient financing, rather than their own dysfunction”. He offers that in the new age, formal lectures should be altogether ended.<br />
<br />
His commentary provides neither data about student learning, nor any direct quotations from professors or students. A 1991 study is cited, and then baptized as the truth with a simple "Nineteen years later, little has changed." The article does not attack a particular university, faculty, or teaching method, but rather an apparently archetypal "university professor".<br />
<br />
So what if he hasn't been on campus in 40 years? He knows how it is. Even then, he "stopped going to classes and dedicated his time to learning from textbooks and reviewing friends’ notes". But Mr. Morgan ignores that a professor somewhere, sometime, must have produced and dictated these textbooks and notes. He finds "no reason why all written course material can’t be delivered via the Internet", obviously not aware that since the 90’s, most course material has been made available on the Internet, thanks to dedicated professors. Morgan's suggestion that we replace large classes with "small informal discussions" sounds great, but how does the CEO propose we pay for the much larger number of professors required to do the job? He wants universities to run like businesses, but as one reader suggested: “If Universities were run like the oil and gas industry we would be back in the dark ages where the only skill required would be to count your money... at least until the oil runs out”.<br />
<br />
It is obvious that we embattled post-secondary teachers and researchers need to worry more about the “very useful idiocy” of Mr. Morgan, the permanent platform he has been provided, and the damage that drivel like this can cause to higher education and advanced research in Canada.<br />
<br />
For the Globe and Mail, Mr. Morgan has been pure comic gold for years. Writing on a variety of subjects, ranging from environmental issues and health care, to research and post-secondary education, he has been a bottomless trove of shameless misrepresentations, extreme views and sheer wackiness. But ultimately, this is not only about Gwyn Morgan nor about the Globe and Mail. It is about us.<br />
<br />
It is about Canada’s University Presidents countering his dangerous Tea Party style rhetoric on our post-secondary institutions.<br />
<br />
It is about the Deans of Canada’s Faculties facing up to Mr. Morgan when he writes: “Many qualified applicants are turned away from areas such as engineering and medicine, while universities continue to graduate thousands with knowledge that is neither useful in getting a job, nor in helping our country succeed in a competitive world.”<br />
<br />
It is about the Royal Society of Canada, and other learned societies responding to his views about “esoteric research that doesn’t have the slightest chance of yielding any real value”.<br />
<br />
It is also up to our schools of journalism, to point out to mainstream media the irresponsibility in printing shallow, empty articles full of generalizations and devoid of facts.<br />
<br />
Mr. Morgan may be one of those individuals who get so many things wrong at once that the thought of challenging them or setting the record straight is just too daunting. But it is incumbent upon us not to let his rhetoric negate the exemplary contributions of thousands of Canada’s scholars, teachers and researchers.<br />
<br />
Nassif Ghoussoub, Professor of Mathematics, The University of British Columbia<br />
<br />
== Cell Phones ==<br />
<br />
Date: Sun, 10 Oct 2010 20:02:29 -0400<br />
From: Stuart Newman <newman@NYMC.EDU><br />
Reply-To: Science for the People Discussion List<br />
<SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU><br />
To: SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU<br />
Subject: "Disconnect": Why cellphones may be killing us<br />
<br />
Though I haven't yet read it, this book is presumably not based on<br />
anecdotal evidence. The author, Devra Davis, is the founding director<br />
of the toxicology and environmental studies board at the U.S. National<br />
Academy of Sciences.<br />
<br />
http://tinyurl.com/2fvycxc [Salon.com]<br />
<br />
"Disconnect": Why cellphones may be killing us<br />
A new book probes the connection between mobile devices and a host<br />
of health problems -- with frightening results<br />
By Thomas Rogers<br />
<br />
== Links ==<br />
* [http://www.ted.com/talks/hans_rosling_the_good_news_of_the_decade.html?utm_source=newsletter_weekly_2010-10-12&utm_campaign=newsletter_weekly&utm_medium=email 2010 TED talk by Hans Rosling]<br />
* [http://en.wikipedia.org/wiki/Apophenia Apophenia]<br />
* [http://en.wikipedia.org/wiki/Pareidolia Pareidolia]<br />
<br />
== Notes on mediation ==<br />
The question of mediation is essentially a question about causality. Is the putative mediator, M say, caused by X and, in turn, a cause of Y? But M, in a mediational analysis, cannot have been randomized even if X has been. The question of mediation is essentially a question about causality with observational, not experimental, data.<br />
<br />
To get a perspective on the problem we need to start by considering the general problem of causality with observational data. Let Y be the response variable and let X be the 'target' variable which is seen as a possible 'cause' of Y. For X to cause Y means that the expected value of Y would change in some target experimental condition in which X was manipulated (perhaps through random allocation) while other variables were left untouched -- not necessarily unchanged.<br />
<br />
For causal inference with observational data, we are interested in what would happen under circumstances that are different from those we have actually observed. Our analysis of our observational data will yield an accurate estimate of the causal effect of X if the model for the observational data has the same coefficient for X as it would have if it were applied to data gathered under the target experimental condition. The challenge is to specify and estimate a model that is 'transferable' from the observational condition to the experimental condition. We need a set of concepts to help us critically assess whether a model is transferable. It is not sufficient to have a model that 'fits' well. It may be necessary to include potential confounding factors even if they are not significant in the prediction model for Y. And it may be necessary to exclude strong predictors that are potential mediators -- variables that must not be held constant as one examines the causal relationship between X and Y. One needs a good understanding of the causal model that is valid under experimental conditions in order to properly specify a transferable observational model.<br />
<br />
The problem can be approached in a surprisingly different way, which is the basis for propensity scores. Instead of focusing on a 'transferable' model for Y, one focuses on a model for the assignment of the target causal variable X using potential confounding variables. As in models for Y, it is important to avoid potential mediators between X and Y. However, the model for X based on confounding factors is a prediction model. Confounding variables may be included, raw or transformed, as long as they are predictive of X. It is not necessary to include variables that are not predictive of X. The criterion for developing the model is statistical fit, a criterion that -- apart from the actual selection of confounding predictors -- is empirical, i.e. it is based on the analysis of the data at hand without reference to external theory that is not verifiable with the data. The assignment model need only be valid for the observational condition. Its validity for the experimental condition is irrelevant.<br />
<br />
What are some of the pros and cons of the two approaches? A good transferable model for Y may provide more precise estimates of the effect of X because more of the variability in Y is accounted for in the model. On the other hand, the validity of a causal estimate based on the propensity score approach depends on assumptions that may be much easier to sustain than those required for the approach based on modeling Y. Broadly, the propensity score approach offers lower bias but not necessarily lower variability. Note that the two approaches are not mutually exclusive. They may be better viewed as two sets of concepts that could be combined in an analysis that draws from both.<br />
<br />
How does this all relate to the analysis of mediation? The Baron and Kenny approach and its variants -- in which I include the various ways of estimating direct and indirect causal effects -- are all based on methods analogous to models for Y. As mentioned earlier, estimating the causal effect of the mediator involves causal inference with observational data -- even in the context of an experiment randomizing X. This invites the question whether propensity score methods could be used in assessing more accurately the causal effect of M. The answer lies in the relatively recent theory of principal stratification [Constantine Frangakis and Donald Rubin (2002) "Principal Stratification in Causal Inference", ''Biometrics'', '''58''', 21--29].<br />
<br />
An accessible reference for the concepts behind propensity scores is Donald Rubin (1997) "Estimating Causal Effects from Large Data Sets Using Propensity Scores," ''Annals of Internal Medicine'', '''127''', 757--763.<br />
<br />
A recent treatment of mediation using principal stratification is given in Chapter 8, "Intermediate Causal Factors," of Herbert Weisberg (2010) ''Bias and Causation: Models and Judgment for Valid Comparisons'', Wiley.<br />
<br />
With the large number of seemingly competing approaches to causal inference, students as well as experienced researchers may feel quite puzzled as to which approach they should use. The answer, possibly, is all of them. Each approach seems to shed light on some aspect of the challenge of causal inference in the absence of pristine randomization. They do not offer recipes so much as sets of concepts that can be applied to help understand research projects and analyses.<br />
<br />
== Shock of the New ==<br />
[http://en.wikipedia.org/wiki/Robert_Hughes_(critic) Robert Hughes (1980)]<br />
* [http://www.youtube.com/watch?v=GFn4UmkBcaQ Surrealism Part 1]<br />
* [http://www.youtube.com/watch?v=2uaA8CfZKRs Surrealism Part 2]<br />
* [http://www.youtube.com/watch?v=dActaAa-teM Surrealism Part 3]<br />
* [http://www.youtube.com/watch?v=ZLmCh0xw4h0 Surrealism Part 4]<br />
* [http://www.youtube.com/watch?v=s-SsWPNNBC4 Surrealism Part 5]<br />
== York AODA ==<br />
* [http://aoda.yorku.ca/cs/interactive/en/# AODA training]<br />
== Notes for NATS 1500 ==<br />
* Topics<br />
** single-sex schools?<br />
<br />
== Notes for MATH 6627 ==<br />
* [http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225 Collection of misleading graphs]<br />
* [http://www.amstat.org/sections/cnsl/BooksJournals.cfm ASA consulting page]<br />
* set up student home page<br />
* first assignment. Find and explore a dataset using<br />
** Ernest Kwan's correlagram<br />
** Lattice (use panels and groups)<br />
** p3d<br />
** gapminder<br />
** should have included candisc<br />
* present a 15-minute(crucial) presentation on the data set and on the method<br />
* prepare a wiki page with links and materials <br />
* Address a few questions:<br />
** What are strengths and weaknesses<br />
** For what kind of dataset is it well suited and what kind not?<br />
** Can you find a dataset that illustrates well the features of this approach?<br />
** Can you compare your approach with other approaches?<br />
[[/test page|test]]<br />
<br />
* develop checklists:<br />
* initial exploration of data<br />
* missing data (explicit and implicit)<br />
* do simulation of parallel methods: check estimation of variance parameters<br />
* use nlme to estimate knot placement in gsp<br />
=== Links ===<br />
* [http://healthland.time.com/2011/09/02/mind-reading-why-bad-math-can-ruin-your-health/ Why bad math can ruin your health]<br />
* [http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html TED talk: David McCandless, The Beauty of Data Visualization]<br />
* Hans Rosling<br />
* Great intro to Gapminder: http://www.mrbartonmaths.com/gapminder.htm<br />
<br />
* [http://scs.math.yorku.ca/index.php?title=Special:Contributions&dir=prev&contribs=user&target=Georges Contributions]<br />
* [http://www.scribd.com/doc/17378132/The-Fallacy-of-Personal-Validation-a-Classroom-Demonstration-of-Gullibility The Forer Effect - a classroom example]<br />
<br />
== Notes for R course ==<br />
* Start: It had to be U ... on the SVD [http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I]<br />
* Use SPSS dates both ways to illustrate <br />
** sub using regular expressions<br />
** import: reading dates into 'Date' format using formats: Include all %a %b %Y and others?<br />
** export: writing a date into a character string using format( Date.object, "%d-%m-%Y") to create variable SPSS can read<br />
* Variable references<br />
*: deal with plethora of ways used differently in different places: <br />
*:* formula ( ~id ), good for variables in different roles ( y ~ log(x) + x2 | id)<br />
*:* interpreted in data: (id), good for single var but can use list : (list(x1,x2))<br />
*:*:Examples:<br />
*:*:* <br />
*:* fully reference: dd$id <br />
*Beware:<br />
* aggregate with a formula drops rows with NAs even though the FUN might be able to handle them<br />
* multiple barplot: http://rtricks.wordpress.com/2009/10/26/multbar-advanced-multiple-barplot-with-sem/<br />
=== Add ===<br />
* Discussion of memory issues: what happens when you work on two computers<br />
=== Links ===<br />
* [http://www.r-bloggers.com/r-popularity-%E2%80%93-steady-growth-and-new-york-times/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R blog]<br />
* John Fox: ICPSR: 2010: [http://socserv.mcmaster.ca/jfox/Courses/R-course/Slides-handout.pdf Overview including slides on building R packages]<br />
* [http://www.icpsr.umich.edu/files/sumprog/biblio/2010/Fox.pdf Introduction to the R computing environment]<br />
* ICPSR 2011:<br />
** [http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/index.html Overall page]<br />
**<br />
<br />
== Notes for High School Talks ==<br />
=== Climate change ===<br />
* http://www.cbc.ca/news/technology/story/2011/09/09/pol-climate-adaptation.html<br />
<br />
== Excel techniques ==<br />
* Regular expressions and string substitution<br />
**<br />
* [http://www.contextures.com/xldataval08.html]<br />
<br />
== Ellipse Seminar ==<br />
[[/Ellipse Seminar]]<br />
== Setting up mathstat email in Thunderbird ==<br />
IMAP mailserver: mathstat.yorku.ca Port: 143 Security: STARTTLS<br />
<br />
Outgoing: mathstat.yorku.ca Port: 587 Security: none?<br />
<br />
== Statistical amusement ==<br />
* Statistical Song channel on youtube: http://www.youtube.com/user/StatisticalSongs<br />
** It had to be U ... on the SVD: http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I<br />
** It don't mean a thing if you don't do modelling: http://www.youtube.com/user/StatisticalSongs#p/u/0/Jzm2hrEfNdY<br />
== On Careers in Statistics and Mathematics==<br />
*[https://sites.google.com/site/statsr4us/intro/2-the-joy-of-stats/statsfuture The Joy of Statistics]<br />
*[http://www.bbc.co.uk/news/business-14631547 How Mathematicians Rule the Markets: Quant Trading]<br />
<br />
== On Teaching Science ==<br />
<br />
* [http://www.nats.yorku.ca/index.shtml Natural Science Web Site]<br />
* [https://sites.google.com/site/changingourteaching/Home/tips-for-teaching-asistants Collection of links on teaching]<br />
<br />
<br />
=== A few videos ===<br />
* [http://www.youtube.com/watch?v=ccReLF6M62Y Why Teach Science by James Randi]<br />
* [http://www.youtube.com/watch?v=BlpyGhABXRA Teaching Introductory Physics]<br />
* [http://www.ted.com/talks/brian_goldman_doctors_make_mistakes_can_we_talk_about_that.html?utm_source=newsletter_weekly_2012-01-25&utm_campaign=newsletter_weekly&utm_medium=email Brian Goldman on learning from mistakes]<br />
<br />
== LOG ==<br />
* [[/a]]</div>Georgeshttp://scs.math.yorku.ca/index.php/User:GeorgesUser:Georges2019-03-16T18:56:27Z<p>Georges: </p>
<hr />
<div>__TOC__<br />
== Notes on Mixed Models ==<br />
* LR tests assumes log-likelihood is quadratic on 'some' transformation of parameter, Wald assumes quadratic on the scale of parameter tested.<br />
* varPower(form = ~ fitted(.), fixed = 1)<br />
** The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response.<br />
** the default level for 'fitted' is the finest level. <br />
** The variance function must be expressed in terms of the expected value of the re-expressed response.<br />
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf<br />
* <opml><body><outline text="Microsoft PowerPoint - Slides" _note="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf " /></body></opml> <br />
== Paradoxes, Fallacies and Other Surprises ==<br />
[[Paradoxes, Fallacies and Other Surprises]]<br />
== Bayes ==<br />
* Two consecutive issues of Statistical Science in 2011 have many interesting article that are related to Bayesian inference:<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059127<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059971<br />
* Experimenting with files:<br />
** jpg file<br />
*** Using the wiki link to the uploaded name: [[File:2013-12-29 18.20.34.jpg|thumb]]<br />
*** Using the wiki link to the uploaded name as media: [[Media:2013-12-29 18.20.34.jpg]]<br />
** .R file<br />
*** FIle wiki link [[File:Tcells.R]]<br />
*** Media wiki link [[Media:Tcells.R]]<br />
* [[Useful formulas]]<br />
* SEM with STAN<br />
** [https://groups.google.com/forum/#!topic/stan-users/dVjm8iES54k Forum discussion re slow convergence]<br />
* Interaction fallacy in a presentation:<br />
*:If you think two variable affect each other then you should include an interaction between them. (Fooled by the word 'interaction').<br />
*[http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf Gelman and Robert on Bayes]<br />
*[http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines MARS: Multivariate Adaptive Regression Splines]<br />
*[http://cran.r-project.org/web/packages/mosaic/vignettes/V2StartTeaching.pdf Teaching with R using MOSAIC by ... and D. Kaplan]<br />
*[http://vudlab.com/simpsons/ Causality: interactive app illustrating Simpson's Paradox]<br />
*[http://www.metafor-project.org/doku.php/tips:testing_factors_lincoms metafor: Tutorial using mixed models for meta analysis]<br />
* [http://andrewgelman.com/2006/06/11/survey_weights/ Andrew Gelman on survey weights with multilevel models]: he suggests unweighted modeling (or a 'variance weighted' analysis, e.g. replication weights) followed by poststratification. <br />
* [https://stat.duke.edu/courses/Fall11/sta101.02/labs/lab1.pdf Intro to R and Rstudio in an intro course at Duke]<br />
* [http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf Intro to R in RStudio]<br />
* [http://www.crcpress.com/product/isbn/9781466515857 Multilevel Modeling Using R]<br />
* [http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf Multilevel Modeling in R by Paul Bliese]<br />
* [http://blog.revolutionanalytics.com/2014/08/statistics-losing-ground-to-cs-losing-image-among-students.html Losing ground to CS?]<br />
* [https://www.youtube.com/watch?v=Is1Ej0Vj0Mw Interview with David Smith at UseR 2014]<br />
* [http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#mapping-variable-values-to-colors On colour]<br />
* [http://cran.r-project.org/web/packages/plot3D/vignettes/plot3D.pdf 3d plotting packages]<br />
* [http://blogs.sas.com/content/iml/2014/08/05/stiglers-seven-pillars-of-statistical-wisdom/ Stigler's Seven Pillars of Statistics]<br />
* [http://glmm.wikidot.com/faq#modelspec FAQ on GLMMs]<br />
* [http://www.math.uah.edu/stat/ Virtual Labs in Probability and Statistics]<br />
* [http://tryr.codeschool.com/ R code school with O'Reilly]<br />
* [http://cran.r-project.org/web/packages/pastecs/pastecs.pdf pastecs package for time series]<br />
* [http://www.bbc.com/news/magazine-28166019 Do doctors understand test results? By William Kremer -- about Gerd Gigerenzer]<br />
* [[/Multiple Testing -- a comment]]<br />
* [[/MOOCs for Data Science]]<br />
* [http://artssquared.wordpress.com/2012/03/21/letter-to-the-provost-and-vp-academic-professor-carl-amrhein/ Arts Squared]<br />
* [[/Using R Markdown]]<br />
* [[/Statistics Links for Courses]]<br />
* [[/Lee Lorch]]<br />
* [http://arxiv.org/pdf/1402.1894v1.pdf Baumer et al. (2014) Using R Markdown in Intro Stats]<br />
* [[/Job ratings]]<br />
* [http://www.stat.cmu.edu/~hseltman/PIER/Bayes/data/schools.R Using WinBUGS on the Netherlands data]<br />
* [[/Climate Change|/Climate Change]]<br />
* [[/Standardize or Not]]<br />
*[[/Mixed Models -- papers]]<br />
*[[/MCMCglmm]]<br />
*[[/Wiki tests]]<br />
<br />
== Cause, correlation, or ... ==<br />
*[http://p.nytimes.com/email/re?location=4z5Q7LhI+KUPcT7snurzN09anQA2MM49IhNWGFarU5GcvOIXzFz0cSuazUKJK97uTp6+uuRfTEhO+dnMZordtiC8Du17IY2zzXWzY7etdKdkq0H3sffQdSh+6YpVRsJcs8ZeVrfzShCpIEB5wXVC1g==&campaign_id=23&instance_id=45715&segment_id=62874&user_id=ecb65bd8a646c4ab6214f51d21246fce&regi_id=7495274 Instant noodles and metabolic syndrome]<br />
== Notepad ==<br />
* [http://www.r-bloggers.com/some-r-resources-for-glms/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R resources for GLMs]<br />
* [https://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture17.pdf On Mosaic plots]<br />
* [[/Academic and Administrative Program Review]]<br />
* [[/Statistics programs]]<br />
* [http://www.yorku.ca/careers/gpse/2013/ Careers Expo at York] with 13 booths from UofT such as Dalla Lana but only one generic FGS booth from York.<br />
* [https://www.researchgate.net/publication/257516014_One_paradox_in_statistical_decision_making#! Schervish's p-value paradox]<br />
* [[/Data]]<br />
* [http://www.universityaffairs.ca/course-evaluations-the-good-the-bad-and-the-ugly.aspx Course evaluations: the good, the bad and the ugly]<br />
* Larry Wasserman on<br />
** [http://normaldeviate.wordpress.com/2013/04/27/the-perils-of-hypothesis-testing-again/ Perils of Hypothesis Testing]<br />
**[http://normaldeviate.wordpress.com/2013/04/13/data-science-the-end-of-statistics/ Data Science]<br />
**[http://normaldeviate.wordpress.com/2012/06/18/48/ Causality]<br />
*[[/HOA]]<br />
*[[/Big data]]<br />
*[[/Student satisfaction]]<br />
*[http://normaldeviate.wordpress.com/2012/12/21/guest-post-rob-tibshirani/ Rob Tibshirani's list of 9 great statistics papers]<br />
*[http://www.newyorker.com/online/blogs/johncassidy/2013/04/the-rogoff-and-reinhart-controversy-a-summing-up.html?mobify=0 Cassidy on the Reinhart-Rogoff controversy.]<br />
*[http://clinicaltrials.gov/ Clinical Trials registry in the US]<br />
*[http://en.wikipedia.org/wiki/Cochrane_Collaboration The Cochrane Collection]<br />
* 2004 ICMJE: policy of registration:<br />
<br />
__TOC__<br />
Recommended sources on statistics:<br />
<br />
There are many excellent sources for information on current statistical issues (Psychonomic Society Journals):<br />
<br />
* Confidence Intervals:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci).<br />
** Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 57, 203-220. doi:10.1037/h0087426<br />
* Effect Size Estimates:<br />
** Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis and the interpretation of research results. Cambridge University Press. ISBN 978-0-521-14246-5.<br />
** Fritz, C. O., Morris, P. E., & Richler, J. J. (2011). Effect size estimates: Current use, calculations and interpretation. Journal of Experimental Psychology: General, 141, 2-18.<br />
** Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York, NY: Routledge/Taylor & Francis Group.<br />
* Meta-analysis:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY US: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci ).<br />
** Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis. New York: Oxford University Press.<br />
* Bayesian Data Analysis:<br />
** Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. San Diego, CA: Elsevier Academic Press. (See www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/)<br />
** Kruschke, J. K. (in press). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. (For a preprint see http://www.indiana.edu/~kruschke/BEST/BEST.pdf).<br />
* Power Analysis:<br />
** Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. (See http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/)<br />
=== Blogs ===<br />
* [http://jeromyanglim.blogspot.ca Jeromy Anglim]<br />
<br />
== Pythagoras Diagram ==<br />
* [http://pages.uoregon.edu/stevensj/MRA/partial.pdf Venn diagram 'fallacy' example]<br />
<br />
==Recent changes==<br />
[[/Links|Links]]<br><br />
[[/Recent Changes]] [[/Contributions]]<br><br />
[[/DO]]<br />
==Topics==<br />
* [[/R packages]]<br />
* [[/Curriculum]]<br />
* [[/HLM links]]<br />
* [[/Education links]]<br />
* Death of Evidence<br />
** [http://www.nature.com/nature/journal/v483/n7387/full/483006a.html Article in Nature: Frozen Out, March 1, 2012]<br />
** [http://www.deathofevidence.ca/ Death of Evidence website]<br />
* [[/Mixed effects for multinomial responses]]<br />
* [[/Ellipse paper comments]]<br />
* On Tobacco (from Matt)<br />
** [http://io9.com/5899612/low+income-countries-are-a-cigarettes-best-friend Low income countries and tobacco]<br />
** [http://www.tobaccoatlas.org/uploads/Images/PDFs/Tobacco_Atlas_4_entire.pdf The Tobacco Atlas]<br />
*[[/fda.R]]<br />
*[[/FSE Scholars evening]]<br />
*[[/MATH 6627 student contributions]] <br />
:[http://scs.math.yorku.ca/index.php?title=Special:UserLogin&type=signup Create new account]<br />
:[http://scs.math.yorku.ca/index.php/SCS_2011:_Statistical_Analysis_and_Programming_with_R SCS R course]<br />
:[[/R packages]]<br />
:[[/SPIDA 20102 preparation]]<br />
__TOC__<br />
== Data scraping ==<br />
* [http://www.r-bloggers.com/preparing-public-data-for-analysis-with-r/ Example from Ministry of Transportation]<br />
== RStudio: Shiny ==<br />
* [http://www.premiersoccerstats.com/wordpress/?p=1273 On Shiny]<br />
* [http://demo.rapporter.net/?sport=ATH-170&weight=0 Rapporter.net]<br />
<br />
== Notes for 6643 ==<br />
* Assignment: Can we produce an estimate of AIC based just on the Wald test?<br />
<br />
== On Pedagogy ==<br />
* [http://www.guardian.co.uk/higher-education-network/blog/2012/oct/18/social-sciences-quantative-skills-training On the importance of quantitative skills in social science]<br />
* http://www.matstat.com/teach/<br />
* [http://www.ma.utexas.edu/users/mks/statmistakes/StatisticsMistakes.html Common Misteaks in Statistics]<br />
=== Advice for students ===<br />
* [http://www.universityaffairs.ca/how-to-ask-for-a-reference-letter.aspx How to ask for a reference letter]<br />
<br />
== Questions (e.g. for survey papers) ==<br />
* Implement more diagnostics in R for lme models<br />
* Explore duality of the whole data matrix<br />
* Extend the UD representation to hyperbola, etc., and include a way of plotting osculation loci<br />
* Explore the geometry of harmonic combinations and its implications for mixed model estimates. What happens as you shift weight from G to <math>(X'X)^{-1}</math>? How does the result wander outside the convex combination? When does it happen and what does it mean?<br />
* Refine Lform and related tools<br />
<br />
== Read ==<br />
* [http://www.ams.org/notices/201001/rtx100100030p.pdf Music: Broken Symmetry,<br />
Geometry, and Complexity]<br />
* https://sites.google.com/site/r4statistics/<br />
<br />
== R course ==<br />
* [https://github.com/hadley/devtools/wiki/ Hadley Wickam Advanced R Development]<br />
* [http://courses.had.co.nz/11-devtools/ Hadley Wickam's R development courses]<br />
Day 2 - add<br />
* final recap of 'lm' interface: subset, na.action, etc., etc.<br />
* discuss formula syntax<br />
* final recap of methods for 'lm'<br />
* note easy extension to 'glm', 'lme', etc.<br />
* note that many 'new' functions do not use this interface, only more 'mature' functions<br />
** lm.formula<br />
* discuss OO showing methods and dispatching<br />
Day 3<br />
* the most useful tools:<br />
** seq<br />
** rep<br />
** replacement functions<br />
* data input<br />
* more programming<br />
** object oriented programming<br />
** using a function in C<br />
* using attributes<br />
* systematic treatment of graphics, including<br />
** par<br />
** xyplot<br />
[[/Day2 Guided Tour of Linear Models.R]]<br />
<br />
== SCS Reads 2011 Links ==<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-I-Simple-v2.pdf Notes on visualizing simple regression]<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-II.r R script for multiple regression]<br />
== Capstone courses ==<br />
* [http://www.amstat.org/publications/jse/v9n1/spurrier.html Course at the U. of South Carolina, 2001]<br />
* <br />
== Links to recent courses ==<br />
*[http://scs.math.yorku.ca/index.php/SCS_2011:_Mixed_Models_with_R SCS 2011: Mixed Models with R]<br />
*[http://statswiki.math.yorku.ca/index.php/SCS:_Mixed_Models_in_R SCS 2010]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2008-09_Practicum_in_Statistical_Consulting MATH 6627 2008-09]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting MATH 6627 2010-11]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2010:_Mixed_Models_with_R SPIDA 2010]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2009:_Mixed_Models_with_R SPIDA 2009]<br />
<br />
*[http://scs.math.yorku.ca/index.php/Spida Spida package]<br />
<br />
== Links to add somewhere ==<br />
* [http://www.ted.com/talks/ben_goldacre_battling_bad_science.html?utm_source=newsletter_weekly_2011-10-04&utm_campaign=newsletter_weekly&utm_medium=email Battling bad science]<br />
*D W Hosmer, S Taber and S Lemeshow () "The importance of assessing the fit of logistic regression models: a case study." ''American Journal of Public Health'', Vol. 81, Issue 12 1630-1635<br />
* [http://www.statmethods.net/ Quick R] for SPSS, SAS and Stata users.<br />
=== Graphics ===<br />
* [http://www.edwardtufte.com/tufte/ ET Modern]<br />
* Striking graphics:<br />
** [http://en.wikipedia.org/wiki/File:U.S._incarceration_rates_1925_onwards.png Incarceration rate in the United States]<br />
<br />
=== Matrices ===<br />
*[http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html Matrix Reference Manual]<br />
*[http://en.wikipedia.org/wiki/Matrix_determinant_lemma Matrix determinant lemma]<br />
*[http://en.wikipedia.org/wiki/Woodbury_matrix_identity Woodbury Matrix Identity]<br />
<br />
=== Simpson's Paradox ===<br />
In the [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979 1979 Canadian federal election] <br />
an unusual event occurred in the Northwest Territories: the Liberals won the popular vote in the territory, but won [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979#National_results neither seat.]<br />
<br />
=== Lee Lorch ===<br />
* [http://aer.sagepub.com/content/36/4/739.abstract Marybeth Gasman (1999)] "Scylla and Charybdis: Navigating the Waters of Academic Freedom at Fisk University During Charles S. Johnson's Administration (1946–1956)" ''American Educational Research Journal''<br />
*: A prominent sociologist and race relations activist, Charles S. Johnson dedicated his life to the advancement of Blacks. His presidency at Fisk University, a historically Black college, was the culmination of his career. During the latter part of his administration, he faced a dilemma involving an outspoken professor named Lee Lorch, who, in 1954, was accused of being a communist. Johnson and the Board of Trustees dismissed Lorch because he refused to answer a congressional committee's questions about his previous political affiliations. In 1959, the American Association of University Professors found the late President Johnson guilty of violating the principles of academic freedom. This article explores the ways in which academic freedom, civil liberties, and civil rights clashed in the Lee Lorch case. Furthermore, it examines the ways in which the setting of a historically Black college alters traditional assumptions about the application of these principles.<br />
* [http://www.nytimes.com/2010/11/22/nyregion/22stuyvesant.html Charles V. Bagli (November 21, 2010), "A New Light on a Fight to Integrate Stuyvesant Town", ''New York Times''.<br />
<br />
== Multilevel Models ==<br />
<br />
=== Expository ===<br />
* [http://glmm.wikidot.com/faq R FAQ]<br />
*[https://perswww.kuleuven.be/~u0018341/documents/ldafda.pdf Verbeke and Molenberghs (2005): Longitudinal Data Analysis Notes]<br />
=== Missing Data ===<br />
* [http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf King et al. (2012) Amelia II]<br />
*<br />
<br />
=== Evaluation ===<br />
* Green, M.J., Medley G.F., & Browne, W.J. (2009). Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression. Veterinary Research, 40(4):30.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675184/pdf/vetres-40-30.pdf<br />
<br />
=== Software for multilevel models ===<br />
{| class="wikitable"<br />
|-<br />
! Package<br />
! Function<br />
! Notes<br />
|-<br />
| R<br />
clmm {ordinal}<br />
| Ordinal response: Fits cumulative link mixed models, i.e. cumulative link models with random effects via the Laplace approximation or the standard and the adaptive Gauss-Hermite quadrature approximation. The functionality in clm is also implemented here. Currently only a single random term is allowed in the location-part of the model.<br />
| <br />
|-<br />
| R: {lme4a}<br />
| Development version of lme4<br />
Download: svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4<br />
<br />
|<br />
|-<br />
|R: {MCMCglmm}<br />
|MCMC Methods for Multi-response Generalized Linear Mixed Models<br />
| <br />
|-<br />
|R: {plm}<br />
|Econometric Analysis of Panel Survey Data<br />
|[http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf Vignette]<br><br />
See p. 3 for comments on first-differencing.<br />
|-<br />
|<br />
|See Snijders and Bosker (2012) for longer list<br />
|<br />
|-<br />
|R: {lme4:nlmm}<br />
|Mon-linear models with lme4<br />
|[http://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/6NLMMH.pdf Presentation by Doug Bates]<br><br />
|}<br />
<br />
== Clones ==<br />
Check for changes and reconcile<br />
* Lab 1<br />
** [[SCS_2011:_Mixed_Models_with_R/Lab_1]]<br />
** [[MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Lab_1]]<br />
<br />
== Read ==<br />
On the age-period-cohort problem:<br />
* see bibliography by Yang: http://home.uchicago.edu/~yangy/research.html<br />
<br />
== Do ==<br />
*[[/spida|spida to do list]]<br />
*[[/p3d|p3d to do list]]<br />
== Read ==<br />
* [http://www.stat.berkeley.edu/~freedman/ Links to recent papers by David Freedman]<br />
* Links to material by Chris Wild:<br />
** http://www.stat.auckland.ac.nz/~wild/StatThink/<br />
** http://www.stat.auckland.ac.nz/showperson.php?uid=wild<br />
* [http://www3.hku.hk/statistics/staff/kaing/ Kai Ng's converse]<br />
<br />
== Notes ==<br />
* [http://www.chrp.org/love/ASACleveland2003Propensity.pdf Good presentation on use of propensity scores]<br />
* [http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/index.html?eref=sihp Simpson's Paradox]<br />
<br />
== R notes ==<br />
* [http://strimmerlab.org/notes/fdr.html False Discovery Rates in R]<br />
=== Items to cover ===<br />
* Wrap up language:<br />
** Selection (give context): indices: index, names, logical, matrix of coordinates, 'subset'<br />
*** Example: dropping NAs from selected variables. Necessary because functions that are most sophisticated methodologically are generally least sophisticated in their interface<br />
**** contrast sophisticated program: lm with unsophisticated lowess<br />
* Using variables in data frames: <br />
** formula oriented functions: xyplot( y ~ x, data = dd )<br />
** explicit: plot( dd$x, dd$y )<br />
** with: with( dd, plot(x,y)); with(dd, xyplot( y ~ x, dd)<br />
** attach: As usual the easiest is deprecated! (why is it only easy and pleasurable things that are ever deprecated)<br />
**: <tt> attach(dd) </tt><br />
**: <tt> plot(x, y) </tt><br />
**: <tt> detach(dd) </tt><br />
*** Problem with 'attach': <br />
**** names in data frame may be masked by names in workspace<br />
**** assignments in workspace not saved in data frame<br />
* Overview of graphics<br />
** Link to http://addictedtor.free.fr/graphiques/<br />
* Programming structures<br />
* Add to graphics:<br />
*:Colours: <tt>pal(grepv('red',colors())); pals() # for all</tt><br />
*:modified tablemissing<br />
==== debugging in R ====<br />
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/debug.shtml<br />
<br />
=== Links ===<br />
*[http://www.stats.ox.ac.uk/pub/MASS4/ MASS 4th ed.] [http://www.stats.ox.ac.uk/pub/MASS4/#Exercises Exercises]<br />
<br />
=== Importing files ===<br />
==== From Excel ====<br />
* Easy: save file in Excel as .csv, then read into R with read.csv<br />
* If you have a lot of files, or get the files from some other sources that edits .xls or .xlsx files:<br />
* The winner: package gdata: <br />
** First install perl. <br />
** read.xls in gdata handles both .xls and .xlsx files<br />
** works on both 32-bit and 64-bit machines<br />
* package XLConnect seems to work only on xlsx files <br />
* the smaller xlsx package also works only xlsx files<br />
* Package xlsReadWrite works on xls files but only on 32-bit systems<br />
* Use xls2csv, a Perl script to convert files to csv first.<br />
<br />
=== Getting lines vs points for different groups in xyplot ===<br />
Ideally, type = c('l','p') would work but it doesn't seem to. So one way is to use type = 'b' with an invisible line for one group and an invisible point for the other:<br />
<br />
library(spida.beta) # also loads 'car'<br />
dd <- Prestige<br />
dd$income.pred <- predict( lm( income ~ education*type, dd), newdata = dd)<br />
td( lty = c(1,0), pch = c(32, 16), lwd = 2) <br />
# lty = 0 produces an invisible line<br />
# and pch = 32 seems to be an invisible point<br />
xyplot( income.pred + income ~ education|type, dd[order(dd$education),], type = 'b',<br />
auto.key = list( columns = 2, lines = T, points = T))<br />
<br />
Also show example using panel.superpose.2<br />
=== Bugs ===<br />
<pre><br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
factor(cut(x, cos, grade, right = FALSE), levels = grade)<br />
}<br />
dg$Grade <- grade( dg$Final )<br />
tab(dg, ~ Grade)<br />
# gets indexing of levels wrong<br />
# the following seems to work correctly<br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
ret <- cut(x, cos, grade, right = FALSE)<br />
factor(ret, levels = grade)<br />
}<br />
</pre> <br />
==== Getting the G matrix in nlme ====<br />
fit <- lme( y ~ x, dd, random = ~1+x |id)<br />
G <- pdMatrix( fit$modelStruct$reStruct)$id<br />
==== Building R packages in 2.14 ====<br />
# Install R<br />
# Install tools: http://robjhyndman.com/researchtips/building-r-packages-for-windows/<br />
<br />
# Info: http://cran.r-project.org/doc/contrib/Graves+DoraiRaj-RPackageDevelopment.pdf<br />
<br />
==== Notes ====<br />
* [http://ipsur.r-forge.r-project.org/book/ IPSUR: Introduction to Probability and Statistics using R]<br />
* [[/test of slash]]<br />
* [[/schedule]]<br />
* [http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=72 Addicted to R Graph Gallery]<br />
* [http://rwiki.sciviews.org/doku.php R Wiki]<br />
* [http://rwiki.sciviews.org/doku.php?id=guides:demos:stata_demo_with_r Stata demo in R]<br />
<br />
== Thumbnail test ==<br />
Here is a graphic file in raw form:<br />
<br />
[[File:UN-missing1.jpg]]<br />
<br />
And here is the same file with a thumbnail:<br />
[[File:UN-missing1.jpg|thumb]]<br />
<br />
== Math check ==<br />
<br />
<font color="red">'''Please click on the 'discussion' tab above'''</font><br />
<br />
:Test how math renders:<br />
<br />
:<math>\begin{align}<br />
f(x) & = (a+b)^2 \\<br />
& = a^2+2ab+b^2 \\<br />
\end{align}</math><br />
<br />
<math> x \perp y </math><br />
<br />
== glmmPQL etc ==<br />
Good discussion between Doug and Ben: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001457.html<br />
== Combining unbiased estimators ==<br />
THis is an example:<br />
* bullet 1<br />
* bullet 2<br />
** again<br />
**: indented <br />
* bullet3<br />
nubmered bullets:<br />
# one<br />
# two<br />
## dkjkdj<br />
##* djkdj<br />
#* djfkd<br />
=== subheading ===<br />
new stuff<br />
==== sub sub ====<br />
more stuff<br />
<br />
<br />
<br />
Let <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> be unbiased estimators of <math>\phi \in {{\mathbb{R}}^{k}}</math> with non-singular variances <math>{{V}_{1}}</math> and <math>{{V}_{2}}</math> respectively.<br />
<br />
Then the minimum variance linear unbiased estimator of<br />
<math>\phi </math> is obtained by combining <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> using weights that are proportional to the inverses of their variances. The result can be expressed in a variety of ways:<br />
<br />
<math>\begin{align}<br />
\hat{\phi } &= {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) \\ <br />
& = {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right)+ & \left[ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}}-{{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}} \right] \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( I+{{V}_{2}}V_{1}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{\left( I+{{V}_{1}}V_{2}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{1}}+{{V}_{1}}V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
The proof is an application of the principle of Generalized Least-Squares. The problem can be formulated as a GLS problem by considering that:<br />
<math>\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]\phi +\left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right]</math> with <math>\operatorname{Var}\left( \left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right] \right)=\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]</math><br />
<br />
Applying the GLS formula yields:<br />
<math>\begin{align}<br />
\hat{\phi } & ={{\left( {{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right] \right)}^{-1}}{{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right] \\ <br />
& ={{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
<br />
== From Nassif Ghoussoub ==<br />
Beware the “useful idiocy” of Mr. Morgan<br />
<br />
The latest commentary of Gwyn Morgan in the Globe and Mail, “If universities were in business, they’d be out of business”, http://www.theglobeandmail.com/report-on-business/commentary/gwyn-morgan/ has crossed another line. Far from being an analysis of the state of Canadian universities, his rant is personal, bitter, demeaning, and insulting to university professors across the country.<br />
<br />
Back in his Globe article of April 29, 2009, “Not all research deserves public funding“, the retired CEO of EnCana Corp. proceeded to rip into the “ivory towers of academia”, attack “esoteric research” and disparage any graduate degree not hailing from medicine or engineering. He also dismissed the 2300 scientists who joined the “Don’t Leave Canada Behind” campaign, which called on the government to include R&D, the lifeblood of the new economy, in its stimulus budget. To its credit, the government of Canada responded positively to the call of its scientists, but from his -dubiously earned- platform at the Globe and Mail, Mr. Morgan kept at it.<br />
<br />
In Saturday’s paper, Mr. Morgan employs sweeping generalizations and ghost statistics to come to the conclusion that, among other things, Canada’s university professors are “poorly prepared” for their lectures, “show up occasionally” to class, and give “poorly thought out assignments”. He claims “the reaction of universities to widespread student dissatisfaction is to blame insufficient financing, rather than their own dysfunction”. He offers that in the new age, formal lectures should be altogether ended.<br />
<br />
His commentary provides neither data about student learning, nor any direct quotations from professors or students. A 1991 study is cited, and then baptized as the truth with a simple "Nineteen years later, little has changed." The article does not attack a particular university, faculty, or teaching method, but rather an apparently archetypal "university professor".<br />
<br />
So what if he hasn't been on campus in 40 years? He knows how it is. Even then, he "stopped going to classes and dedicated his time to learning from textbooks and reviewing friends’ notes". But Mr. Morgan ignores that a professor somewhere, sometime, must have produced and dictated these textbooks and notes. He finds "no reason why all written course material can’t be delivered via the Internet", obviously not aware that since the 90’s, most course material has been made available on the Internet, thanks to dedicated professors. Morgan's suggestion that we replace large classes with "small informal discussions" sounds great, but how does the CEO propose we pay for the much larger number of professors required to do the job? He wants universities to run like businesses, but as one reader suggested: “If Universities were run like the oil and gas industry we would be back in the dark ages where the only skill required would be to count your money... at least until the oil runs out”.<br />
<br />
It is obvious that we embattled post-secondary teachers and researchers need to worry more about the “very useful idiocy” of Mr. Morgan, the permanent platform he has been provided, and the damage that drivel like this can cause to higher education and advanced research in Canada.<br />
<br />
For the Globe and Mail, Mr. Morgan has been pure comic gold for years. Writing on a variety of subjects, ranging from environmental issues and health care, to research and post-secondary education, he has been a bottomless trove of shameless misrepresentations, extreme views and sheer wackiness. But ultimately, this is not only about Gwyn Morgan nor about the Globe and Mail. It is about us.<br />
<br />
It is about Canada’s University Presidents countering his dangerous Tea Party style rhetoric on our post-secondary institutions.<br />
<br />
It is about the Deans of Canada’s Faculties facing up to Mr. Morgan when he writes: “Many qualified applicants are turned away from areas such as engineering and medicine, while universities continue to graduate thousands with knowledge that is neither useful in getting a job, nor in helping our country succeed in a competitive world.”<br />
<br />
It is about the Royal Society of Canada, and other learned societies responding to his views about “esoteric research that doesn’t have the slightest chance of yielding any real value”.<br />
<br />
It is also up to our schools of journalism, to point out to mainstream media the irresponsibility in printing shallow, empty articles full of generalizations and devoid of facts.<br />
<br />
Mr. Morgan may be one of those individuals who get so many things wrong at once that the thought of challenging them or setting the record straight is just too daunting. But it is incumbent upon us not to let his rhetoric negate the exemplary contributions of thousands of Canada’s scholars, teachers and researchers.<br />
<br />
Nassif Ghoussoub, Professor of Mathematics, The University of British Columbia<br />
<br />
== Cell Phones ==<br />
<br />
Date: Sun, 10 Oct 2010 20:02:29 -0400<br />
From: Stuart Newman <newman@NYMC.EDU><br />
Reply-To: Science for the People Discussion List<br />
<SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU><br />
To: SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU<br />
Subject: "Disconnect": Why cellphones may be killing us<br />
<br />
Though I haven't yet read it, this book is presumably not based on<br />
anecdotal evidence. The author, Devra Davis, is the founding director<br />
of the toxicology and environmental studies board at the U.S. National<br />
Academy of Sciences.<br />
<br />
http://tinyurl.com/2fvycxc [Salon.com]<br />
<br />
"Disconnect": Why cellphones may be killing us<br />
A new book probes the connection between mobile devices and a host<br />
of health problems -- with frightening results<br />
By Thomas Rogers<br />
<br />
== Links ==<br />
* [http://www.ted.com/talks/hans_rosling_the_good_news_of_the_decade.html?utm_source=newsletter_weekly_2010-10-12&utm_campaign=newsletter_weekly&utm_medium=email 2010 TED talk by Hans Rosling]<br />
* [http://en.wikipedia.org/wiki/Apophenia Apophenia]<br />
* [http://en.wikipedia.org/wiki/Pareidolia Pareidolia]<br />
<br />
== Notes on mediation ==<br />
The question of mediation is essentially a question about causality. Is the putative mediator, M say, caused by X and, in turn, a cause of Y? But M, in a mediational analysis, cannot have been randomized even if X has been. The question of mediation is essentially a question about causality with observational, not experimental, data.<br />
<br />
To get a perspective on the problem we need to start by considering the general problem of causality with observational data. Let Y be the response variable and let X be the 'target' variable which is seen as a possible 'cause' of Y. For X to cause Y means that the expected value of Y would change in some target experimental condition in which X was manipulated (perhaps through random allocation) while other variables were left untouched -- not necessarily unchanged.<br />
<br />
For causal inference with observational data, we are interested in what would happen under circumstances that are different from those we have actually observed. Our analysis of our observational data will yield an accurate estimate of the causal effect of X if the model for the observational data has the same coefficient for X as it would have if it were applied to data gathered under the target experimental condition. The challenge is to specify and estimate a model that is 'transferable' from the observational condition to the experimental condition. We need a set of concepts to help us critically assess whether a model is transferable. It is not sufficient to have a model that 'fits' well. It may be necessary to include potential confounding factors even if they are not significant in the prediction model for Y. And it may be necessary to exclude strong predictors that are potential mediators -- variables that must not be held constant as one examines the causal relationship between X and Y. One needs a good understanding of the causal model that is valid under experimental conditions in order to properly specify a transferable observational model.<br />
<br />
The problem can be approached in a surprisingly different way, which is the basis for propensity scores. Instead of focusing on a 'transferable' model for Y, one focuses on a model for the assignment of the target causal variable X using potential confounding variables. As in models for Y, it is important to avoid potential mediators between X and Y. However, the model for X based on confounding factors is a prediction model. Confounding variables may be included, raw or transformed, as long as they are predictive of X. It is not necessary to include variables that are not predictive of X. The criterion for developing the model is statistical fit, a criterion that -- apart from the actual selection of confounding predictors -- is empirical, i.e. it is based on the analysis of the data at hand without reference to external theory that is not verifiable with the data. The assignment model need only be valid for the observational condition. Its validity for the experimental condition is irrelevant.<br />
<br />
What are some of the pros and cons of the two approaches? A good transferable model for Y may provide more precise estimates of the effect of X because more of the variability in Y is accounted for in the model. On the other hand, the validity of a causal estimate based on the propensity score approach depends on assumptions that may be much easier to sustain than those required for the approach based on modeling Y. Broadly, the propensity score approach offers lower bias but not necessarily lower variability. Note that the two approaches are not mutually exclusive. They may be better viewed as two sets of concepts that could be combined in an analysis that draws from both.<br />
<br />
How does this all relate to the analysis of mediation? The Baron and Kenny approach and its variants -- in which I include the various ways of estimating direct and indirect causal effects -- are all based on methods analogous to models for Y. As mentioned earlier, estimating the causal effect of the mediator involves causal inference with observational data -- even in the context of an experiment randomizing X. This invites the question whether propensity score methods could be used in assessing more accurately the causal effect of M. The answer lies in the relatively recent theory of principal stratification [Constantine Frangakis and Donald Rubin (2002) "Principal Stratification in Causal Inference", ''Biometrics'', '''58''', 21--29].<br />
<br />
An accessible reference for the concepts behind propensity scores is Donald Rubin (1997) "Estimating Causal Effects from Large Data Sets Using Propensity Scores," ''Annals of Internal Medicine'', '''127''', 757--763.<br />
<br />
A recent treatment of mediation using principal stratification is given in Chapter 8, "Intermediate Causal Factors," of Herbert Weisberg (2010) ''Bias and Causation: Models and Judgment for Valid Comparisons'', Wiley.<br />
<br />
With the large number of seemingly competing approaches to causal inference, students as well as experienced researchers may feel quite puzzled as to which approach they should use. The answer, possibly, is all of them. Each approach seems to shed light on some aspect of the challenge of causal inference in the absence of pristine randomization. They do not offer recipes so much as sets of concepts that can be applied to help understand research projects and analyses.<br />
<br />
== Shock of the New ==<br />
[http://en.wikipedia.org/wiki/Robert_Hughes_(critic) Robert Hughes (1980)]<br />
* [http://www.youtube.com/watch?v=GFn4UmkBcaQ Surrealism Part 1]<br />
* [http://www.youtube.com/watch?v=2uaA8CfZKRs Surrealism Part 2]<br />
* [http://www.youtube.com/watch?v=dActaAa-teM Surrealism Part 3]<br />
* [http://www.youtube.com/watch?v=ZLmCh0xw4h0 Surrealism Part 4]<br />
* [http://www.youtube.com/watch?v=s-SsWPNNBC4 Surrealism Part 5]<br />
== York AODA ==<br />
* [http://aoda.yorku.ca/cs/interactive/en/# AODA training]<br />
== Notes for NATS 1500 ==<br />
* Topics<br />
** single-sex schools?<br />
<br />
== Notes for MATH 6627 ==<br />
* [http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225 Collection of misleading graphs]<br />
* [http://www.amstat.org/sections/cnsl/BooksJournals.cfm ASA consulting page]<br />
* set up student home page<br />
* first assignment. Find and explore a dataset using<br />
** Ernest Kwan's correlagram<br />
** Lattice (use panels and groups)<br />
** p3d<br />
** gapminder<br />
** should have included candisc<br />
* present a 15-minute(crucial) presentation on the data set and on the method<br />
* prepare a wiki page with links and materials <br />
* Address a few questions:<br />
** What are strengths and weaknesses<br />
** For what kind of dataset is it well suited and what kind not?<br />
** Can you find a dataset that illustrates well the features of this approach?<br />
** Can you compare your approach with other approaches?<br />
[[/test page|test]]<br />
<br />
* develop checklists:<br />
* initial exploration of data<br />
* missing data (explicit and implicit)<br />
* do simulation of parallel methods: check estimation of variance parameters<br />
* use nlme to estimate knot placement in gsp<br />
=== Links ===<br />
* [http://healthland.time.com/2011/09/02/mind-reading-why-bad-math-can-ruin-your-health/ Why bad math can ruin your health]<br />
* [http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html TED talk: David McCandless, The Beauty of Data Visualization]<br />
* Hans Rosling<br />
* Great intro to Gapminder: http://www.mrbartonmaths.com/gapminder.htm<br />
<br />
* [http://scs.math.yorku.ca/index.php?title=Special:Contributions&dir=prev&contribs=user&target=Georges Contributions]<br />
* [http://www.scribd.com/doc/17378132/The-Fallacy-of-Personal-Validation-a-Classroom-Demonstration-of-Gullibility The Forer Effect - a classroom example]<br />
<br />
== Notes for R course ==<br />
* Start: It had to be U ... on the SVD [http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I]<br />
* Use SPSS dates both ways to illustrate <br />
** sub using regular expressions<br />
** import: reading dates into 'Date' format using formats: Include all %a %b %Y and others?<br />
** export: writing a date into a character string using format( Date.object, "%d-%m-%Y") to create variable SPSS can read<br />
* Variable references<br />
*: deal with plethora of ways used differently in different places: <br />
*:* formula ( ~id ), good for variables in different roles ( y ~ log(x) + x2 | id)<br />
*:* interpreted in data: (id), good for single var but can use list : (list(x1,x2))<br />
*:*:Examples:<br />
*:*:* <br />
*:* fully reference: dd$id <br />
*Beware:<br />
* aggregate with a formula drops rows with NAs even though the FUN might be able to handle them<br />
* multiple barplot: http://rtricks.wordpress.com/2009/10/26/multbar-advanced-multiple-barplot-with-sem/<br />
=== Add ===<br />
* Discussion of memory issues: what happens when you work on two computers<br />
=== Links ===<br />
* [http://www.r-bloggers.com/r-popularity-%E2%80%93-steady-growth-and-new-york-times/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R blog]<br />
* John Fox: ICPSR: 2010: [http://socserv.mcmaster.ca/jfox/Courses/R-course/Slides-handout.pdf Overview including slides on building R packages]<br />
* [http://www.icpsr.umich.edu/files/sumprog/biblio/2010/Fox.pdf Introduction to the R computing environment]<br />
* ICPSR 2011:<br />
** [http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/index.html Overall page]<br />
**<br />
<br />
== Notes for High School Talks ==<br />
=== Climate change ===<br />
* http://www.cbc.ca/news/technology/story/2011/09/09/pol-climate-adaptation.html<br />
<br />
== Excel techniques ==<br />
* Regular expressions and string substitution<br />
**<br />
* [http://www.contextures.com/xldataval08.html]<br />
<br />
== Ellipse Seminar ==<br />
[[/Ellipse Seminar]]<br />
== Setting up mathstat email in Thunderbird ==<br />
IMAP mailserver: mathstat.yorku.ca Port: 143 Security: STARTTLS<br />
<br />
Outgoing: mathstat.yorku.ca Port: 587 Security: none?<br />
<br />
== Statistical amusement ==<br />
* Statistical Song channel on youtube: http://www.youtube.com/user/StatisticalSongs<br />
** It had to be U ... on the SVD: http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I<br />
** It don't mean a thing if you don't do modelling: http://www.youtube.com/user/StatisticalSongs#p/u/0/Jzm2hrEfNdY<br />
== On Careers in Statistics and Mathematics==<br />
*[https://sites.google.com/site/statsr4us/intro/2-the-joy-of-stats/statsfuture The Joy of Statistics]<br />
*[http://www.bbc.co.uk/news/business-14631547 How Mathematicians Rule the Markets: Quant Trading]<br />
<br />
== On Teaching Science ==<br />
<br />
* [http://www.nats.yorku.ca/index.shtml Natural Science Web Site]<br />
* [https://sites.google.com/site/changingourteaching/Home/tips-for-teaching-asistants Collection of links on teaching]<br />
<br />
<br />
=== A few videos ===<br />
* [http://www.youtube.com/watch?v=ccReLF6M62Y Why Teach Science by James Randi]<br />
* [http://www.youtube.com/watch?v=BlpyGhABXRA Teaching Introductory Physics]<br />
* [http://www.ted.com/talks/brian_goldman_doctors_make_mistakes_can_we_talk_about_that.html?utm_source=newsletter_weekly_2012-01-25&utm_campaign=newsletter_weekly&utm_medium=email Brian Goldman on learning from mistakes]<br />
<br />
== LOG ==<br />
* [[/a]]</div>Georgeshttp://scs.math.yorku.ca/index.php/User:GeorgesUser:Georges2019-03-13T01:41:16Z<p>Georges: </p>
<hr />
<div>__TOC__<br />
== Notes on Mixed Models ==<br />
* varPower(form = ~ fitted(.), fixed = 1)<br />
** The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response.<br />
** the default level for 'fitted' is the finest level. <br />
** The variance function must be expressed in terms of the expected value of the re-expressed response.<br />
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf<br />
* <opml><body><outline text="Microsoft PowerPoint - Slides" _note="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf " /></body></opml> <br />
== Paradoxes, Fallacies and Other Surprises ==<br />
[[Paradoxes, Fallacies and Other Surprises]]<br />
== Bayes ==<br />
* Two consecutive issues of Statistical Science in 2011 have many interesting article that are related to Bayesian inference:<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059127<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059971<br />
* Experimenting with files:<br />
** jpg file<br />
*** Using the wiki link to the uploaded name: [[File:2013-12-29 18.20.34.jpg|thumb]]<br />
*** Using the wiki link to the uploaded name as media: [[Media:2013-12-29 18.20.34.jpg]]<br />
** .R file<br />
*** FIle wiki link [[File:Tcells.R]]<br />
*** Media wiki link [[Media:Tcells.R]]<br />
* [[Useful formulas]]<br />
* SEM with STAN<br />
** [https://groups.google.com/forum/#!topic/stan-users/dVjm8iES54k Forum discussion re slow convergence]<br />
* Interaction fallacy in a presentation:<br />
*:If you think two variable affect each other then you should include an interaction between them. (Fooled by the word 'interaction').<br />
*[http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf Gelman and Robert on Bayes]<br />
*[http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines MARS: Multivariate Adaptive Regression Splines]<br />
*[http://cran.r-project.org/web/packages/mosaic/vignettes/V2StartTeaching.pdf Teaching with R using MOSAIC by ... and D. Kaplan]<br />
*[http://vudlab.com/simpsons/ Causality: interactive app illustrating Simpson's Paradox]<br />
*[http://www.metafor-project.org/doku.php/tips:testing_factors_lincoms metafor: Tutorial using mixed models for meta analysis]<br />
* [http://andrewgelman.com/2006/06/11/survey_weights/ Andrew Gelman on survey weights with multilevel models]: he suggests unweighted modeling (or a 'variance weighted' analysis, e.g. replication weights) followed by poststratification. <br />
* [https://stat.duke.edu/courses/Fall11/sta101.02/labs/lab1.pdf Intro to R and Rstudio in an intro course at Duke]<br />
* [http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf Intro to R in RStudio]<br />
* [http://www.crcpress.com/product/isbn/9781466515857 Multilevel Modeling Using R]<br />
* [http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf Multilevel Modeling in R by Paul Bliese]<br />
* [http://blog.revolutionanalytics.com/2014/08/statistics-losing-ground-to-cs-losing-image-among-students.html Losing ground to CS?]<br />
* [https://www.youtube.com/watch?v=Is1Ej0Vj0Mw Interview with David Smith at UseR 2014]<br />
* [http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#mapping-variable-values-to-colors On colour]<br />
* [http://cran.r-project.org/web/packages/plot3D/vignettes/plot3D.pdf 3d plotting packages]<br />
* [http://blogs.sas.com/content/iml/2014/08/05/stiglers-seven-pillars-of-statistical-wisdom/ Stigler's Seven Pillars of Statistics]<br />
* [http://glmm.wikidot.com/faq#modelspec FAQ on GLMMs]<br />
* [http://www.math.uah.edu/stat/ Virtual Labs in Probability and Statistics]<br />
* [http://tryr.codeschool.com/ R code school with O'Reilly]<br />
* [http://cran.r-project.org/web/packages/pastecs/pastecs.pdf pastecs package for time series]<br />
* [http://www.bbc.com/news/magazine-28166019 Do doctors understand test results? By William Kremer -- about Gerd Gigerenzer]<br />
* [[/Multiple Testing -- a comment]]<br />
* [[/MOOCs for Data Science]]<br />
* [http://artssquared.wordpress.com/2012/03/21/letter-to-the-provost-and-vp-academic-professor-carl-amrhein/ Arts Squared]<br />
* [[/Using R Markdown]]<br />
* [[/Statistics Links for Courses]]<br />
* [[/Lee Lorch]]<br />
* [http://arxiv.org/pdf/1402.1894v1.pdf Baumer et al. (2014) Using R Markdown in Intro Stats]<br />
* [[/Job ratings]]<br />
* [http://www.stat.cmu.edu/~hseltman/PIER/Bayes/data/schools.R Using WinBUGS on the Netherlands data]<br />
* [[/Climate Change|/Climate Change]]<br />
* [[/Standardize or Not]]<br />
*[[/Mixed Models -- papers]]<br />
*[[/MCMCglmm]]<br />
*[[/Wiki tests]]<br />
<br />
== Cause, correlation, or ... ==<br />
*[http://p.nytimes.com/email/re?location=4z5Q7LhI+KUPcT7snurzN09anQA2MM49IhNWGFarU5GcvOIXzFz0cSuazUKJK97uTp6+uuRfTEhO+dnMZordtiC8Du17IY2zzXWzY7etdKdkq0H3sffQdSh+6YpVRsJcs8ZeVrfzShCpIEB5wXVC1g==&campaign_id=23&instance_id=45715&segment_id=62874&user_id=ecb65bd8a646c4ab6214f51d21246fce&regi_id=7495274 Instant noodles and metabolic syndrome]<br />
== Notepad ==<br />
* [http://www.r-bloggers.com/some-r-resources-for-glms/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R resources for GLMs]<br />
* [https://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture17.pdf On Mosaic plots]<br />
* [[/Academic and Administrative Program Review]]<br />
* [[/Statistics programs]]<br />
* [http://www.yorku.ca/careers/gpse/2013/ Careers Expo at York] with 13 booths from UofT such as Dalla Lana but only one generic FGS booth from York.<br />
* [https://www.researchgate.net/publication/257516014_One_paradox_in_statistical_decision_making#! Schervish's p-value paradox]<br />
* [[/Data]]<br />
* [http://www.universityaffairs.ca/course-evaluations-the-good-the-bad-and-the-ugly.aspx Course evaluations: the good, the bad and the ugly]<br />
* Larry Wasserman on<br />
** [http://normaldeviate.wordpress.com/2013/04/27/the-perils-of-hypothesis-testing-again/ Perils of Hypothesis Testing]<br />
**[http://normaldeviate.wordpress.com/2013/04/13/data-science-the-end-of-statistics/ Data Science]<br />
**[http://normaldeviate.wordpress.com/2012/06/18/48/ Causality]<br />
*[[/HOA]]<br />
*[[/Big data]]<br />
*[[/Student satisfaction]]<br />
*[http://normaldeviate.wordpress.com/2012/12/21/guest-post-rob-tibshirani/ Rob Tibshirani's list of 9 great statistics papers]<br />
*[http://www.newyorker.com/online/blogs/johncassidy/2013/04/the-rogoff-and-reinhart-controversy-a-summing-up.html?mobify=0 Cassidy on the Reinhart-Rogoff controversy.]<br />
*[http://clinicaltrials.gov/ Clinical Trials registry in the US]<br />
*[http://en.wikipedia.org/wiki/Cochrane_Collaboration The Cochrane Collection]<br />
* 2004 ICMJE: policy of registration:<br />
<br />
__TOC__<br />
Recommended sources on statistics:<br />
<br />
There are many excellent sources for information on current statistical issues (Psychonomic Society Journals):<br />
<br />
* Confidence Intervals:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci).<br />
** Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 57, 203-220. doi:10.1037/h0087426<br />
* Effect Size Estimates:<br />
** Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis and the interpretation of research results. Cambridge University Press. ISBN 978-0-521-14246-5.<br />
** Fritz, C. O., Morris, P. E., & Richler, J. J. (2011). Effect size estimates: Current use, calculations and interpretation. Journal of Experimental Psychology: General, 141, 2-18.<br />
** Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York, NY: Routledge/Taylor & Francis Group.<br />
* Meta-analysis:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY US: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci ).<br />
** Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis. New York: Oxford University Press.<br />
* Bayesian Data Analysis:<br />
** Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. San Diego, CA: Elsevier Academic Press. (See www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/)<br />
** Kruschke, J. K. (in press). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. (For a preprint see http://www.indiana.edu/~kruschke/BEST/BEST.pdf).<br />
* Power Analysis:<br />
** Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. (See http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/)<br />
=== Blogs ===<br />
* [http://jeromyanglim.blogspot.ca Jeromy Anglim]<br />
<br />
== Pythagoras Diagram ==<br />
* [http://pages.uoregon.edu/stevensj/MRA/partial.pdf Venn diagram 'fallacy' example]<br />
<br />
==Recent changes==<br />
[[/Links|Links]]<br><br />
[[/Recent Changes]] [[/Contributions]]<br><br />
[[/DO]]<br />
==Topics==<br />
* [[/R packages]]<br />
* [[/Curriculum]]<br />
* [[/HLM links]]<br />
* [[/Education links]]<br />
* Death of Evidence<br />
** [http://www.nature.com/nature/journal/v483/n7387/full/483006a.html Article in Nature: Frozen Out, March 1, 2012]<br />
** [http://www.deathofevidence.ca/ Death of Evidence website]<br />
* [[/Mixed effects for multinomial responses]]<br />
* [[/Ellipse paper comments]]<br />
* On Tobacco (from Matt)<br />
** [http://io9.com/5899612/low+income-countries-are-a-cigarettes-best-friend Low income countries and tobacco]<br />
** [http://www.tobaccoatlas.org/uploads/Images/PDFs/Tobacco_Atlas_4_entire.pdf The Tobacco Atlas]<br />
*[[/fda.R]]<br />
*[[/FSE Scholars evening]]<br />
*[[/MATH 6627 student contributions]] <br />
:[http://scs.math.yorku.ca/index.php?title=Special:UserLogin&type=signup Create new account]<br />
:[http://scs.math.yorku.ca/index.php/SCS_2011:_Statistical_Analysis_and_Programming_with_R SCS R course]<br />
:[[/R packages]]<br />
:[[/SPIDA 20102 preparation]]<br />
__TOC__<br />
== Data scraping ==<br />
* [http://www.r-bloggers.com/preparing-public-data-for-analysis-with-r/ Example from Ministry of Transportation]<br />
== RStudio: Shiny ==<br />
* [http://www.premiersoccerstats.com/wordpress/?p=1273 On Shiny]<br />
* [http://demo.rapporter.net/?sport=ATH-170&weight=0 Rapporter.net]<br />
<br />
== Notes for 6643 ==<br />
* Assignment: Can we produce an estimate of AIC based just on the Wald test?<br />
<br />
== On Pedagogy ==<br />
* [http://www.guardian.co.uk/higher-education-network/blog/2012/oct/18/social-sciences-quantative-skills-training On the importance of quantitative skills in social science]<br />
* http://www.matstat.com/teach/<br />
* [http://www.ma.utexas.edu/users/mks/statmistakes/StatisticsMistakes.html Common Misteaks in Statistics]<br />
=== Advice for students ===<br />
* [http://www.universityaffairs.ca/how-to-ask-for-a-reference-letter.aspx How to ask for a reference letter]<br />
<br />
== Questions (e.g. for survey papers) ==<br />
* Implement more diagnostics in R for lme models<br />
* Explore duality of the whole data matrix<br />
* Extend the UD representation to hyperbola, etc., and include a way of plotting osculation loci<br />
* Explore the geometry of harmonic combinations and its implications for mixed model estimates. What happens as you shift weight from G to <math>(X'X)^{-1}</math>? How does the result wander outside the convex combination? When does it happen and what does it mean?<br />
* Refine Lform and related tools<br />
<br />
== Read ==<br />
* [http://www.ams.org/notices/201001/rtx100100030p.pdf Music: Broken Symmetry,<br />
Geometry, and Complexity]<br />
* https://sites.google.com/site/r4statistics/<br />
<br />
== R course ==<br />
* [https://github.com/hadley/devtools/wiki/ Hadley Wickam Advanced R Development]<br />
* [http://courses.had.co.nz/11-devtools/ Hadley Wickam's R development courses]<br />
Day 2 - add<br />
* final recap of 'lm' interface: subset, na.action, etc., etc.<br />
* discuss formula syntax<br />
* final recap of methods for 'lm'<br />
* note easy extension to 'glm', 'lme', etc.<br />
* note that many 'new' functions do not use this interface, only more 'mature' functions<br />
** lm.formula<br />
* discuss OO showing methods and dispatching<br />
Day 3<br />
* the most useful tools:<br />
** seq<br />
** rep<br />
** replacement functions<br />
* data input<br />
* more programming<br />
** object oriented programming<br />
** using a function in C<br />
* using attributes<br />
* systematic treatment of graphics, including<br />
** par<br />
** xyplot<br />
[[/Day2 Guided Tour of Linear Models.R]]<br />
<br />
== SCS Reads 2011 Links ==<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-I-Simple-v2.pdf Notes on visualizing simple regression]<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-II.r R script for multiple regression]<br />
== Capstone courses ==<br />
* [http://www.amstat.org/publications/jse/v9n1/spurrier.html Course at the U. of South Carolina, 2001]<br />
* <br />
== Links to recent courses ==<br />
*[http://scs.math.yorku.ca/index.php/SCS_2011:_Mixed_Models_with_R SCS 2011: Mixed Models with R]<br />
*[http://statswiki.math.yorku.ca/index.php/SCS:_Mixed_Models_in_R SCS 2010]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2008-09_Practicum_in_Statistical_Consulting MATH 6627 2008-09]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting MATH 6627 2010-11]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2010:_Mixed_Models_with_R SPIDA 2010]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2009:_Mixed_Models_with_R SPIDA 2009]<br />
<br />
*[http://scs.math.yorku.ca/index.php/Spida Spida package]<br />
<br />
== Links to add somewhere ==<br />
* [http://www.ted.com/talks/ben_goldacre_battling_bad_science.html?utm_source=newsletter_weekly_2011-10-04&utm_campaign=newsletter_weekly&utm_medium=email Battling bad science]<br />
*D W Hosmer, S Taber and S Lemeshow () "The importance of assessing the fit of logistic regression models: a case study." ''American Journal of Public Health'', Vol. 81, Issue 12 1630-1635<br />
* [http://www.statmethods.net/ Quick R] for SPSS, SAS and Stata users.<br />
=== Graphics ===<br />
* [http://www.edwardtufte.com/tufte/ ET Modern]<br />
* Striking graphics:<br />
** [http://en.wikipedia.org/wiki/File:U.S._incarceration_rates_1925_onwards.png Incarceration rate in the United States]<br />
<br />
=== Matrices ===<br />
*[http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html Matrix Reference Manual]<br />
*[http://en.wikipedia.org/wiki/Matrix_determinant_lemma Matrix determinant lemma]<br />
*[http://en.wikipedia.org/wiki/Woodbury_matrix_identity Woodbury Matrix Identity]<br />
<br />
=== Simpson's Paradox ===<br />
In the [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979 1979 Canadian federal election] <br />
an unusual event occurred in the Northwest Territories: the Liberals won the popular vote in the territory, but won [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979#National_results neither seat.]<br />
<br />
=== Lee Lorch ===<br />
* [http://aer.sagepub.com/content/36/4/739.abstract Marybeth Gasman (1999)] "Scylla and Charybdis: Navigating the Waters of Academic Freedom at Fisk University During Charles S. Johnson's Administration (1946–1956)" ''American Educational Research Journal''<br />
*: A prominent sociologist and race relations activist, Charles S. Johnson dedicated his life to the advancement of Blacks. His presidency at Fisk University, a historically Black college, was the culmination of his career. During the latter part of his administration, he faced a dilemma involving an outspoken professor named Lee Lorch, who, in 1954, was accused of being a communist. Johnson and the Board of Trustees dismissed Lorch because he refused to answer a congressional committee's questions about his previous political affiliations. In 1959, the American Association of University Professors found the late President Johnson guilty of violating the principles of academic freedom. This article explores the ways in which academic freedom, civil liberties, and civil rights clashed in the Lee Lorch case. Furthermore, it examines the ways in which the setting of a historically Black college alters traditional assumptions about the application of these principles.<br />
* [http://www.nytimes.com/2010/11/22/nyregion/22stuyvesant.html Charles V. Bagli (November 21, 2010), "A New Light on a Fight to Integrate Stuyvesant Town", ''New York Times''.<br />
<br />
== Multilevel Models ==<br />
<br />
=== Expository ===<br />
* [http://glmm.wikidot.com/faq R FAQ]<br />
*[https://perswww.kuleuven.be/~u0018341/documents/ldafda.pdf Verbeke and Molenberghs (2005): Longitudinal Data Analysis Notes]<br />
=== Missing Data ===<br />
* [http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf King et al. (2012) Amelia II]<br />
*<br />
<br />
=== Evaluation ===<br />
* Green, M.J., Medley G.F., & Browne, W.J. (2009). Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression. Veterinary Research, 40(4):30.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675184/pdf/vetres-40-30.pdf<br />
<br />
=== Software for multilevel models ===<br />
{| class="wikitable"<br />
|-<br />
! Package<br />
! Function<br />
! Notes<br />
|-<br />
| R<br />
clmm {ordinal}<br />
| Ordinal response: Fits cumulative link mixed models, i.e. cumulative link models with random effects via the Laplace approximation or the standard and the adaptive Gauss-Hermite quadrature approximation. The functionality in clm is also implemented here. Currently only a single random term is allowed in the location-part of the model.<br />
| <br />
|-<br />
| R: {lme4a}<br />
| Development version of lme4<br />
Download: svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4<br />
<br />
|<br />
|-<br />
|R: {MCMCglmm}<br />
|MCMC Methods for Multi-response Generalized Linear Mixed Models<br />
| <br />
|-<br />
|R: {plm}<br />
|Econometric Analysis of Panel Survey Data<br />
|[http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf Vignette]<br><br />
See p. 3 for comments on first-differencing.<br />
|-<br />
|<br />
|See Snijders and Bosker (2012) for longer list<br />
|<br />
|-<br />
|R: {lme4:nlmm}<br />
|Mon-linear models with lme4<br />
|[http://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/6NLMMH.pdf Presentation by Doug Bates]<br><br />
|}<br />
<br />
== Clones ==<br />
Check for changes and reconcile<br />
* Lab 1<br />
** [[SCS_2011:_Mixed_Models_with_R/Lab_1]]<br />
** [[MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Lab_1]]<br />
<br />
== Read ==<br />
On the age-period-cohort problem:<br />
* see bibliography by Yang: http://home.uchicago.edu/~yangy/research.html<br />
<br />
== Do ==<br />
*[[/spida|spida to do list]]<br />
*[[/p3d|p3d to do list]]<br />
== Read ==<br />
* [http://www.stat.berkeley.edu/~freedman/ Links to recent papers by David Freedman]<br />
* Links to material by Chris Wild:<br />
** http://www.stat.auckland.ac.nz/~wild/StatThink/<br />
** http://www.stat.auckland.ac.nz/showperson.php?uid=wild<br />
* [http://www3.hku.hk/statistics/staff/kaing/ Kai Ng's converse]<br />
<br />
== Notes ==<br />
* [http://www.chrp.org/love/ASACleveland2003Propensity.pdf Good presentation on use of propensity scores]<br />
* [http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/index.html?eref=sihp Simpson's Paradox]<br />
<br />
== R notes ==<br />
* [http://strimmerlab.org/notes/fdr.html False Discovery Rates in R]<br />
=== Items to cover ===<br />
* Wrap up language:<br />
** Selection (give context): indices: index, names, logical, matrix of coordinates, 'subset'<br />
*** Example: dropping NAs from selected variables. Necessary because functions that are most sophisticated methodologically are generally least sophisticated in their interface<br />
**** contrast sophisticated program: lm with unsophisticated lowess<br />
* Using variables in data frames: <br />
** formula oriented functions: xyplot( y ~ x, data = dd )<br />
** explicit: plot( dd$x, dd$y )<br />
** with: with( dd, plot(x,y)); with(dd, xyplot( y ~ x, dd)<br />
** attach: As usual the easiest is deprecated! (why is it only easy and pleasurable things that are ever deprecated)<br />
**: <tt> attach(dd) </tt><br />
**: <tt> plot(x, y) </tt><br />
**: <tt> detach(dd) </tt><br />
*** Problem with 'attach': <br />
**** names in data frame may be masked by names in workspace<br />
**** assignments in workspace not saved in data frame<br />
* Overview of graphics<br />
** Link to http://addictedtor.free.fr/graphiques/<br />
* Programming structures<br />
* Add to graphics:<br />
*:Colours: <tt>pal(grepv('red',colors())); pals() # for all</tt><br />
*:modified tablemissing<br />
==== debugging in R ====<br />
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/debug.shtml<br />
<br />
=== Links ===<br />
*[http://www.stats.ox.ac.uk/pub/MASS4/ MASS 4th ed.] [http://www.stats.ox.ac.uk/pub/MASS4/#Exercises Exercises]<br />
<br />
=== Importing files ===<br />
==== From Excel ====<br />
* Easy: save file in Excel as .csv, then read into R with read.csv<br />
* If you have a lot of files, or get the files from some other sources that edits .xls or .xlsx files:<br />
* The winner: package gdata: <br />
** First install perl. <br />
** read.xls in gdata handles both .xls and .xlsx files<br />
** works on both 32-bit and 64-bit machines<br />
* package XLConnect seems to work only on xlsx files <br />
* the smaller xlsx package also works only xlsx files<br />
* Package xlsReadWrite works on xls files but only on 32-bit systems<br />
* Use xls2csv, a Perl script to convert files to csv first.<br />
<br />
=== Getting lines vs points for different groups in xyplot ===<br />
Ideally, type = c('l','p') would work but it doesn't seem to. So one way is to use type = 'b' with an invisible line for one group and an invisible point for the other:<br />
<br />
library(spida.beta) # also loads 'car'<br />
dd <- Prestige<br />
dd$income.pred <- predict( lm( income ~ education*type, dd), newdata = dd)<br />
td( lty = c(1,0), pch = c(32, 16), lwd = 2) <br />
# lty = 0 produces an invisible line<br />
# and pch = 32 seems to be an invisible point<br />
xyplot( income.pred + income ~ education|type, dd[order(dd$education),], type = 'b',<br />
auto.key = list( columns = 2, lines = T, points = T))<br />
<br />
Also show example using panel.superpose.2<br />
=== Bugs ===<br />
<pre><br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
factor(cut(x, cos, grade, right = FALSE), levels = grade)<br />
}<br />
dg$Grade <- grade( dg$Final )<br />
tab(dg, ~ Grade)<br />
# gets indexing of levels wrong<br />
# the following seems to work correctly<br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
ret <- cut(x, cos, grade, right = FALSE)<br />
factor(ret, levels = grade)<br />
}<br />
</pre> <br />
==== Getting the G matrix in nlme ====<br />
fit <- lme( y ~ x, dd, random = ~1+x |id)<br />
G <- pdMatrix( fit$modelStruct$reStruct)$id<br />
==== Building R packages in 2.14 ====<br />
# Install R<br />
# Install tools: http://robjhyndman.com/researchtips/building-r-packages-for-windows/<br />
<br />
# Info: http://cran.r-project.org/doc/contrib/Graves+DoraiRaj-RPackageDevelopment.pdf<br />
<br />
==== Notes ====<br />
* [http://ipsur.r-forge.r-project.org/book/ IPSUR: Introduction to Probability and Statistics using R]<br />
* [[/test of slash]]<br />
* [[/schedule]]<br />
* [http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=72 Addicted to R Graph Gallery]<br />
* [http://rwiki.sciviews.org/doku.php R Wiki]<br />
* [http://rwiki.sciviews.org/doku.php?id=guides:demos:stata_demo_with_r Stata demo in R]<br />
<br />
== Thumbnail test ==<br />
Here is a graphic file in raw form:<br />
<br />
[[File:UN-missing1.jpg]]<br />
<br />
And here is the same file with a thumbnail:<br />
[[File:UN-missing1.jpg|thumb]]<br />
<br />
== Math check ==<br />
<br />
<font color="red">'''Please click on the 'discussion' tab above'''</font><br />
<br />
:Test how math renders:<br />
<br />
:<math>\begin{align}<br />
f(x) & = (a+b)^2 \\<br />
& = a^2+2ab+b^2 \\<br />
\end{align}</math><br />
<br />
<math> x \perp y </math><br />
<br />
== glmmPQL etc ==<br />
Good discussion between Doug and Ben: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001457.html<br />
== Combining unbiased estimators ==<br />
THis is an example:<br />
* bullet 1<br />
* bullet 2<br />
** again<br />
**: indented <br />
* bullet3<br />
nubmered bullets:<br />
# one<br />
# two<br />
## dkjkdj<br />
##* djkdj<br />
#* djfkd<br />
=== subheading ===<br />
new stuff<br />
==== sub sub ====<br />
more stuff<br />
<br />
<br />
<br />
Let <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> be unbiased estimators of <math>\phi \in {{\mathbb{R}}^{k}}</math> with non-singular variances <math>{{V}_{1}}</math> and <math>{{V}_{2}}</math> respectively.<br />
<br />
Then the minimum variance linear unbiased estimator of<br />
<math>\phi </math> is obtained by combining <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> using weights that are proportional to the inverses of their variances. The result can be expressed in a variety of ways:<br />
<br />
<math>\begin{align}<br />
\hat{\phi } &= {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) \\ <br />
& = {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right)+ & \left[ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}}-{{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}} \right] \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( I+{{V}_{2}}V_{1}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{\left( I+{{V}_{1}}V_{2}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{1}}+{{V}_{1}}V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
The proof is an application of the principle of Generalized Least-Squares. The problem can be formulated as a GLS problem by considering that:<br />
<math>\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]\phi +\left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right]</math> with <math>\operatorname{Var}\left( \left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right] \right)=\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]</math><br />
<br />
Applying the GLS formula yields:<br />
<math>\begin{align}<br />
\hat{\phi } & ={{\left( {{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right] \right)}^{-1}}{{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right] \\ <br />
& ={{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
<br />
== From Nassif Ghoussoub ==<br />
Beware the “useful idiocy” of Mr. Morgan<br />
<br />
The latest commentary of Gwyn Morgan in the Globe and Mail, “If universities were in business, they’d be out of business”, http://www.theglobeandmail.com/report-on-business/commentary/gwyn-morgan/ has crossed another line. Far from being an analysis of the state of Canadian universities, his rant is personal, bitter, demeaning, and insulting to university professors across the country.<br />
<br />
Back in his Globe article of April 29, 2009, “Not all research deserves public funding“, the retired CEO of EnCana Corp. proceeded to rip into the “ivory towers of academia”, attack “esoteric research” and disparage any graduate degree not hailing from medicine or engineering. He also dismissed the 2300 scientists who joined the “Don’t Leave Canada Behind” campaign, which called on the government to include R&D, the lifeblood of the new economy, in its stimulus budget. To its credit, the government of Canada responded positively to the call of its scientists, but from his -dubiously earned- platform at the Globe and Mail, Mr. Morgan kept at it.<br />
<br />
In Saturday’s paper, Mr. Morgan employs sweeping generalizations and ghost statistics to come to the conclusion that, among other things, Canada’s university professors are “poorly prepared” for their lectures, “show up occasionally” to class, and give “poorly thought out assignments”. He claims “the reaction of universities to widespread student dissatisfaction is to blame insufficient financing, rather than their own dysfunction”. He offers that in the new age, formal lectures should be altogether ended.<br />
<br />
His commentary provides neither data about student learning, nor any direct quotations from professors or students. A 1991 study is cited, and then baptized as the truth with a simple "Nineteen years later, little has changed." The article does not attack a particular university, faculty, or teaching method, but rather an apparently archetypal "university professor".<br />
<br />
So what if he hasn't been on campus in 40 years? He knows how it is. Even then, he "stopped going to classes and dedicated his time to learning from textbooks and reviewing friends’ notes". But Mr. Morgan ignores that a professor somewhere, sometime, must have produced and dictated these textbooks and notes. He finds "no reason why all written course material can’t be delivered via the Internet", obviously not aware that since the 90’s, most course material has been made available on the Internet, thanks to dedicated professors. Morgan's suggestion that we replace large classes with "small informal discussions" sounds great, but how does the CEO propose we pay for the much larger number of professors required to do the job? He wants universities to run like businesses, but as one reader suggested: “If Universities were run like the oil and gas industry we would be back in the dark ages where the only skill required would be to count your money... at least until the oil runs out”.<br />
<br />
It is obvious that we embattled post-secondary teachers and researchers need to worry more about the “very useful idiocy” of Mr. Morgan, the permanent platform he has been provided, and the damage that drivel like this can cause to higher education and advanced research in Canada.<br />
<br />
For the Globe and Mail, Mr. Morgan has been pure comic gold for years. Writing on a variety of subjects, ranging from environmental issues and health care, to research and post-secondary education, he has been a bottomless trove of shameless misrepresentations, extreme views and sheer wackiness. But ultimately, this is not only about Gwyn Morgan nor about the Globe and Mail. It is about us.<br />
<br />
It is about Canada’s University Presidents countering his dangerous Tea Party style rhetoric on our post-secondary institutions.<br />
<br />
It is about the Deans of Canada’s Faculties facing up to Mr. Morgan when he writes: “Many qualified applicants are turned away from areas such as engineering and medicine, while universities continue to graduate thousands with knowledge that is neither useful in getting a job, nor in helping our country succeed in a competitive world.”<br />
<br />
It is about the Royal Society of Canada, and other learned societies responding to his views about “esoteric research that doesn’t have the slightest chance of yielding any real value”.<br />
<br />
It is also up to our schools of journalism, to point out to mainstream media the irresponsibility in printing shallow, empty articles full of generalizations and devoid of facts.<br />
<br />
Mr. Morgan may be one of those individuals who get so many things wrong at once that the thought of challenging them or setting the record straight is just too daunting. But it is incumbent upon us not to let his rhetoric negate the exemplary contributions of thousands of Canada’s scholars, teachers and researchers.<br />
<br />
Nassif Ghoussoub, Professor of Mathematics, The University of British Columbia<br />
<br />
== Cell Phones ==<br />
<br />
Date: Sun, 10 Oct 2010 20:02:29 -0400<br />
From: Stuart Newman <newman@NYMC.EDU><br />
Reply-To: Science for the People Discussion List<br />
<SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU><br />
To: SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU<br />
Subject: "Disconnect": Why cellphones may be killing us<br />
<br />
Though I haven't yet read it, this book is presumably not based on<br />
anecdotal evidence. The author, Devra Davis, is the founding director<br />
of the toxicology and environmental studies board at the U.S. National<br />
Academy of Sciences.<br />
<br />
http://tinyurl.com/2fvycxc [Salon.com]<br />
<br />
"Disconnect": Why cellphones may be killing us<br />
A new book probes the connection between mobile devices and a host<br />
of health problems -- with frightening results<br />
By Thomas Rogers<br />
<br />
== Links ==<br />
* [http://www.ted.com/talks/hans_rosling_the_good_news_of_the_decade.html?utm_source=newsletter_weekly_2010-10-12&utm_campaign=newsletter_weekly&utm_medium=email 2010 TED talk by Hans Rosling]<br />
* [http://en.wikipedia.org/wiki/Apophenia Apophenia]<br />
* [http://en.wikipedia.org/wiki/Pareidolia Pareidolia]<br />
<br />
== Notes on mediation ==<br />
The question of mediation is essentially a question about causality. Is the putative mediator, M say, caused by X and, in turn, a cause of Y? But M, in a mediational analysis, cannot have been randomized even if X has been. The question of mediation is essentially a question about causality with observational, not experimental, data.<br />
<br />
To get a perspective on the problem we need to start by considering the general problem of causality with observational data. Let Y be the response variable and let X be the 'target' variable which is seen as a possible 'cause' of Y. For X to cause Y means that the expected value of Y would change in some target experimental condition in which X was manipulated (perhaps through random allocation) while other variables were left untouched -- not necessarily unchanged.<br />
<br />
For causal inference with observational data, we are interested in what would happen under circumstances that are different from those we have actually observed. Our analysis of our observational data will yield an accurate estimate of the causal effect of X if the model for the observational data has the same coefficient for X as it would have if it were applied to data gathered under the target experimental condition. The challenge is to specify and estimate a model that is 'transferable' from the observational condition to the experimental condition. We need a set of concepts to help us critically assess whether a model is transferable. It is not sufficient to have a model that 'fits' well. It may be necessary to include potential confounding factors even if they are not significant in the prediction model for Y. And it may be necessary to exclude strong predictors that are potential mediators -- variables that must not be held constant as one examines the causal relationship between X and Y. One needs a good understanding of the causal model that is valid under experimental conditions in order to properly specify a transferable observational model.<br />
<br />
The problem can be approached in a surprisingly different way, which is the basis for propensity scores. Instead of focusing on a 'transferable' model for Y, one focuses on a model for the assignment of the target causal variable X using potential confounding variables. As in models for Y, it is important to avoid potential mediators between X and Y. However, the model for X based on confounding factors is a prediction model. Confounding variables may be included, raw or transformed, as long as they are predictive of X. It is not necessary to include variables that are not predictive of X. The criterion for developing the model is statistical fit, a criterion that -- apart from the actual selection of confounding predictors -- is empirical, i.e. it is based on the analysis of the data at hand without reference to external theory that is not verifiable with the data. The assignment model need only be valid for the observational condition. Its validity for the experimental condition is irrelevant.<br />
<br />
What are some of the pros and cons of the two approaches? A good transferable model for Y may provide more precise estimates of the effect of X because more of the variability in Y is accounted for in the model. On the other hand, the validity of a causal estimate based on the propensity score approach depends on assumptions that may be much easier to sustain than those required for the approach based on modeling Y. Broadly, the propensity score approach offers lower bias but not necessarily lower variability. Note that the two approaches are not mutually exclusive. They may be better viewed as two sets of concepts that could be combined in an analysis that draws from both.<br />
<br />
How does this all relate to the analysis of mediation? The Baron and Kenny approach and its variants -- in which I include the various ways of estimating direct and indirect causal effects -- are all based on methods analogous to models for Y. As mentioned earlier, estimating the causal effect of the mediator involves causal inference with observational data -- even in the context of an experiment randomizing X. This invites the question whether propensity score methods could be used in assessing more accurately the causal effect of M. The answer lies in the relatively recent theory of principal stratification [Constantine Frangakis and Donald Rubin (2002) "Principal Stratification in Causal Inference", ''Biometrics'', '''58''', 21--29].<br />
<br />
An accessible reference for the concepts behind propensity scores is Donald Rubin (1997) "Estimating Causal Effects from Large Data Sets Using Propensity Scores," ''Annals of Internal Medicine'', '''127''', 757--763.<br />
<br />
A recent treatment of mediation using principal stratification is given in Chapter 8, "Intermediate Causal Factors," of Herbert Weisberg (2010) ''Bias and Causation: Models and Judgment for Valid Comparisons'', Wiley.<br />
<br />
With the large number of seemingly competing approaches to causal inference, students as well as experienced researchers may feel quite puzzled as to which approach they should use. The answer, possibly, is all of them. Each approach seems to shed light on some aspect of the challenge of causal inference in the absence of pristine randomization. They do not offer recipes so much as sets of concepts that can be applied to help understand research projects and analyses.<br />
<br />
== Shock of the New ==<br />
[http://en.wikipedia.org/wiki/Robert_Hughes_(critic) Robert Hughes (1980)]<br />
* [http://www.youtube.com/watch?v=GFn4UmkBcaQ Surrealism Part 1]<br />
* [http://www.youtube.com/watch?v=2uaA8CfZKRs Surrealism Part 2]<br />
* [http://www.youtube.com/watch?v=dActaAa-teM Surrealism Part 3]<br />
* [http://www.youtube.com/watch?v=ZLmCh0xw4h0 Surrealism Part 4]<br />
* [http://www.youtube.com/watch?v=s-SsWPNNBC4 Surrealism Part 5]<br />
== York AODA ==<br />
* [http://aoda.yorku.ca/cs/interactive/en/# AODA training]<br />
== Notes for NATS 1500 ==<br />
* Topics<br />
** single-sex schools?<br />
<br />
== Notes for MATH 6627 ==<br />
* [http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225 Collection of misleading graphs]<br />
* [http://www.amstat.org/sections/cnsl/BooksJournals.cfm ASA consulting page]<br />
* set up student home page<br />
* first assignment. Find and explore a dataset using<br />
** Ernest Kwan's correlagram<br />
** Lattice (use panels and groups)<br />
** p3d<br />
** gapminder<br />
** should have included candisc<br />
* present a 15-minute(crucial) presentation on the data set and on the method<br />
* prepare a wiki page with links and materials <br />
* Address a few questions:<br />
** What are strengths and weaknesses<br />
** For what kind of dataset is it well suited and what kind not?<br />
** Can you find a dataset that illustrates well the features of this approach?<br />
** Can you compare your approach with other approaches?<br />
[[/test page|test]]<br />
<br />
* develop checklists:<br />
* initial exploration of data<br />
* missing data (explicit and implicit)<br />
* do simulation of parallel methods: check estimation of variance parameters<br />
* use nlme to estimate knot placement in gsp<br />
=== Links ===<br />
* [http://healthland.time.com/2011/09/02/mind-reading-why-bad-math-can-ruin-your-health/ Why bad math can ruin your health]<br />
* [http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html TED talk: David McCandless, The Beauty of Data Visualization]<br />
* Hans Rosling<br />
* Great intro to Gapminder: http://www.mrbartonmaths.com/gapminder.htm<br />
<br />
* [http://scs.math.yorku.ca/index.php?title=Special:Contributions&dir=prev&contribs=user&target=Georges Contributions]<br />
* [http://www.scribd.com/doc/17378132/The-Fallacy-of-Personal-Validation-a-Classroom-Demonstration-of-Gullibility The Forer Effect - a classroom example]<br />
<br />
== Notes for R course ==<br />
* Start: It had to be U ... on the SVD [http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I]<br />
* Use SPSS dates both ways to illustrate <br />
** sub using regular expressions<br />
** import: reading dates into 'Date' format using formats: Include all %a %b %Y and others?<br />
** export: writing a date into a character string using format( Date.object, "%d-%m-%Y") to create variable SPSS can read<br />
* Variable references<br />
*: deal with plethora of ways used differently in different places: <br />
*:* formula ( ~id ), good for variables in different roles ( y ~ log(x) + x2 | id)<br />
*:* interpreted in data: (id), good for single var but can use list : (list(x1,x2))<br />
*:*:Examples:<br />
*:*:* <br />
*:* fully reference: dd$id <br />
*Beware:<br />
* aggregate with a formula drops rows with NAs even though the FUN might be able to handle them<br />
* multiple barplot: http://rtricks.wordpress.com/2009/10/26/multbar-advanced-multiple-barplot-with-sem/<br />
=== Add ===<br />
* Discussion of memory issues: what happens when you work on two computers<br />
=== Links ===<br />
* [http://www.r-bloggers.com/r-popularity-%E2%80%93-steady-growth-and-new-york-times/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R blog]<br />
* John Fox: ICPSR: 2010: [http://socserv.mcmaster.ca/jfox/Courses/R-course/Slides-handout.pdf Overview including slides on building R packages]<br />
* [http://www.icpsr.umich.edu/files/sumprog/biblio/2010/Fox.pdf Introduction to the R computing environment]<br />
* ICPSR 2011:<br />
** [http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/index.html Overall page]<br />
**<br />
<br />
== Notes for High School Talks ==<br />
=== Climate change ===<br />
* http://www.cbc.ca/news/technology/story/2011/09/09/pol-climate-adaptation.html<br />
<br />
== Excel techniques ==<br />
* Regular expressions and string substitution<br />
**<br />
* [http://www.contextures.com/xldataval08.html]<br />
<br />
== Ellipse Seminar ==<br />
[[/Ellipse Seminar]]<br />
== Setting up mathstat email in Thunderbird ==<br />
IMAP mailserver: mathstat.yorku.ca Port: 143 Security: STARTTLS<br />
<br />
Outgoing: mathstat.yorku.ca Port: 587 Security: none?<br />
<br />
== Statistical amusement ==<br />
* Statistical Song channel on youtube: http://www.youtube.com/user/StatisticalSongs<br />
** It had to be U ... on the SVD: http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I<br />
** It don't mean a thing if you don't do modelling: http://www.youtube.com/user/StatisticalSongs#p/u/0/Jzm2hrEfNdY<br />
== On Careers in Statistics and Mathematics==<br />
*[https://sites.google.com/site/statsr4us/intro/2-the-joy-of-stats/statsfuture The Joy of Statistics]<br />
*[http://www.bbc.co.uk/news/business-14631547 How Mathematicians Rule the Markets: Quant Trading]<br />
<br />
== On Teaching Science ==<br />
<br />
* [http://www.nats.yorku.ca/index.shtml Natural Science Web Site]<br />
* [https://sites.google.com/site/changingourteaching/Home/tips-for-teaching-asistants Collection of links on teaching]<br />
<br />
<br />
=== A few videos ===<br />
* [http://www.youtube.com/watch?v=ccReLF6M62Y Why Teach Science by James Randi]<br />
* [http://www.youtube.com/watch?v=BlpyGhABXRA Teaching Introductory Physics]<br />
* [http://www.ted.com/talks/brian_goldman_doctors_make_mistakes_can_we_talk_about_that.html?utm_source=newsletter_weekly_2012-01-25&utm_campaign=newsletter_weekly&utm_medium=email Brian Goldman on learning from mistakes]<br />
<br />
== LOG ==<br />
* [[/a]]</div>Georgeshttp://scs.math.yorku.ca/index.php/User:GeorgesUser:Georges2019-03-12T20:19:34Z<p>Georges: </p>
<hr />
<div>__TOC__<br />
== Notes on Mixed Models ==<br />
* varPower(form = ~ fitted(.), fixed = 1)<br />
** The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response.<br />
** the default level for 'fitted' is the finest level. <br />
* http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf<br />
* <opml><body><outline text="Microsoft PowerPoint - Slides" _note="http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DemingMi/2007-04-27.pdf " /></body></opml> <br />
== Paradoxes, Fallacies and Other Surprises ==<br />
[[Paradoxes, Fallacies and Other Surprises]]<br />
== Bayes ==<br />
* Two consecutive issues of Statistical Science in 2011 have many interesting article that are related to Bayesian inference:<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059127<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059971<br />
* Experimenting with files:<br />
** jpg file<br />
*** Using the wiki link to the uploaded name: [[File:2013-12-29 18.20.34.jpg|thumb]]<br />
*** Using the wiki link to the uploaded name as media: [[Media:2013-12-29 18.20.34.jpg]]<br />
** .R file<br />
*** FIle wiki link [[File:Tcells.R]]<br />
*** Media wiki link [[Media:Tcells.R]]<br />
* [[Useful formulas]]<br />
* SEM with STAN<br />
** [https://groups.google.com/forum/#!topic/stan-users/dVjm8iES54k Forum discussion re slow convergence]<br />
* Interaction fallacy in a presentation:<br />
*:If you think two variable affect each other then you should include an interaction between them. (Fooled by the word 'interaction').<br />
*[http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf Gelman and Robert on Bayes]<br />
*[http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines MARS: Multivariate Adaptive Regression Splines]<br />
*[http://cran.r-project.org/web/packages/mosaic/vignettes/V2StartTeaching.pdf Teaching with R using MOSAIC by ... and D. Kaplan]<br />
*[http://vudlab.com/simpsons/ Causality: interactive app illustrating Simpson's Paradox]<br />
*[http://www.metafor-project.org/doku.php/tips:testing_factors_lincoms metafor: Tutorial using mixed models for meta analysis]<br />
* [http://andrewgelman.com/2006/06/11/survey_weights/ Andrew Gelman on survey weights with multilevel models]: he suggests unweighted modeling (or a 'variance weighted' analysis, e.g. replication weights) followed by poststratification. <br />
* [https://stat.duke.edu/courses/Fall11/sta101.02/labs/lab1.pdf Intro to R and Rstudio in an intro course at Duke]<br />
* [http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf Intro to R in RStudio]<br />
* [http://www.crcpress.com/product/isbn/9781466515857 Multilevel Modeling Using R]<br />
* [http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf Multilevel Modeling in R by Paul Bliese]<br />
* [http://blog.revolutionanalytics.com/2014/08/statistics-losing-ground-to-cs-losing-image-among-students.html Losing ground to CS?]<br />
* [https://www.youtube.com/watch?v=Is1Ej0Vj0Mw Interview with David Smith at UseR 2014]<br />
* [http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#mapping-variable-values-to-colors On colour]<br />
* [http://cran.r-project.org/web/packages/plot3D/vignettes/plot3D.pdf 3d plotting packages]<br />
* [http://blogs.sas.com/content/iml/2014/08/05/stiglers-seven-pillars-of-statistical-wisdom/ Stigler's Seven Pillars of Statistics]<br />
* [http://glmm.wikidot.com/faq#modelspec FAQ on GLMMs]<br />
* [http://www.math.uah.edu/stat/ Virtual Labs in Probability and Statistics]<br />
* [http://tryr.codeschool.com/ R code school with O'Reilly]<br />
* [http://cran.r-project.org/web/packages/pastecs/pastecs.pdf pastecs package for time series]<br />
* [http://www.bbc.com/news/magazine-28166019 Do doctors understand test results? By William Kremer -- about Gerd Gigerenzer]<br />
* [[/Multiple Testing -- a comment]]<br />
* [[/MOOCs for Data Science]]<br />
* [http://artssquared.wordpress.com/2012/03/21/letter-to-the-provost-and-vp-academic-professor-carl-amrhein/ Arts Squared]<br />
* [[/Using R Markdown]]<br />
* [[/Statistics Links for Courses]]<br />
* [[/Lee Lorch]]<br />
* [http://arxiv.org/pdf/1402.1894v1.pdf Baumer et al. (2014) Using R Markdown in Intro Stats]<br />
* [[/Job ratings]]<br />
* [http://www.stat.cmu.edu/~hseltman/PIER/Bayes/data/schools.R Using WinBUGS on the Netherlands data]<br />
* [[/Climate Change|/Climate Change]]<br />
* [[/Standardize or Not]]<br />
*[[/Mixed Models -- papers]]<br />
*[[/MCMCglmm]]<br />
*[[/Wiki tests]]<br />
<br />
== Cause, correlation, or ... ==<br />
*[http://p.nytimes.com/email/re?location=4z5Q7LhI+KUPcT7snurzN09anQA2MM49IhNWGFarU5GcvOIXzFz0cSuazUKJK97uTp6+uuRfTEhO+dnMZordtiC8Du17IY2zzXWzY7etdKdkq0H3sffQdSh+6YpVRsJcs8ZeVrfzShCpIEB5wXVC1g==&campaign_id=23&instance_id=45715&segment_id=62874&user_id=ecb65bd8a646c4ab6214f51d21246fce&regi_id=7495274 Instant noodles and metabolic syndrome]<br />
== Notepad ==<br />
* [http://www.r-bloggers.com/some-r-resources-for-glms/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R resources for GLMs]<br />
* [https://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture17.pdf On Mosaic plots]<br />
* [[/Academic and Administrative Program Review]]<br />
* [[/Statistics programs]]<br />
* [http://www.yorku.ca/careers/gpse/2013/ Careers Expo at York] with 13 booths from UofT such as Dalla Lana but only one generic FGS booth from York.<br />
* [https://www.researchgate.net/publication/257516014_One_paradox_in_statistical_decision_making#! Schervish's p-value paradox]<br />
* [[/Data]]<br />
* [http://www.universityaffairs.ca/course-evaluations-the-good-the-bad-and-the-ugly.aspx Course evaluations: the good, the bad and the ugly]<br />
* Larry Wasserman on<br />
** [http://normaldeviate.wordpress.com/2013/04/27/the-perils-of-hypothesis-testing-again/ Perils of Hypothesis Testing]<br />
**[http://normaldeviate.wordpress.com/2013/04/13/data-science-the-end-of-statistics/ Data Science]<br />
**[http://normaldeviate.wordpress.com/2012/06/18/48/ Causality]<br />
*[[/HOA]]<br />
*[[/Big data]]<br />
*[[/Student satisfaction]]<br />
*[http://normaldeviate.wordpress.com/2012/12/21/guest-post-rob-tibshirani/ Rob Tibshirani's list of 9 great statistics papers]<br />
*[http://www.newyorker.com/online/blogs/johncassidy/2013/04/the-rogoff-and-reinhart-controversy-a-summing-up.html?mobify=0 Cassidy on the Reinhart-Rogoff controversy.]<br />
*[http://clinicaltrials.gov/ Clinical Trials registry in the US]<br />
*[http://en.wikipedia.org/wiki/Cochrane_Collaboration The Cochrane Collection]<br />
* 2004 ICMJE: policy of registration:<br />
<br />
__TOC__<br />
Recommended sources on statistics:<br />
<br />
There are many excellent sources for information on current statistical issues (Psychonomic Society Journals):<br />
<br />
* Confidence Intervals:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci).<br />
** Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 57, 203-220. doi:10.1037/h0087426<br />
* Effect Size Estimates:<br />
** Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis and the interpretation of research results. Cambridge University Press. ISBN 978-0-521-14246-5.<br />
** Fritz, C. O., Morris, P. E., & Richler, J. J. (2011). Effect size estimates: Current use, calculations and interpretation. Journal of Experimental Psychology: General, 141, 2-18.<br />
** Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York, NY: Routledge/Taylor & Francis Group.<br />
* Meta-analysis:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY US: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci ).<br />
** Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis. New York: Oxford University Press.<br />
* Bayesian Data Analysis:<br />
** Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. San Diego, CA: Elsevier Academic Press. (See www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/)<br />
** Kruschke, J. K. (in press). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. (For a preprint see http://www.indiana.edu/~kruschke/BEST/BEST.pdf).<br />
* Power Analysis:<br />
** Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. (See http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/)<br />
=== Blogs ===<br />
* [http://jeromyanglim.blogspot.ca Jeromy Anglim]<br />
<br />
== Pythagoras Diagram ==<br />
* [http://pages.uoregon.edu/stevensj/MRA/partial.pdf Venn diagram 'fallacy' example]<br />
<br />
==Recent changes==<br />
[[/Links|Links]]<br><br />
[[/Recent Changes]] [[/Contributions]]<br><br />
[[/DO]]<br />
==Topics==<br />
* [[/R packages]]<br />
* [[/Curriculum]]<br />
* [[/HLM links]]<br />
* [[/Education links]]<br />
* Death of Evidence<br />
** [http://www.nature.com/nature/journal/v483/n7387/full/483006a.html Article in Nature: Frozen Out, March 1, 2012]<br />
** [http://www.deathofevidence.ca/ Death of Evidence website]<br />
* [[/Mixed effects for multinomial responses]]<br />
* [[/Ellipse paper comments]]<br />
* On Tobacco (from Matt)<br />
** [http://io9.com/5899612/low+income-countries-are-a-cigarettes-best-friend Low income countries and tobacco]<br />
** [http://www.tobaccoatlas.org/uploads/Images/PDFs/Tobacco_Atlas_4_entire.pdf The Tobacco Atlas]<br />
*[[/fda.R]]<br />
*[[/FSE Scholars evening]]<br />
*[[/MATH 6627 student contributions]] <br />
:[http://scs.math.yorku.ca/index.php?title=Special:UserLogin&type=signup Create new account]<br />
:[http://scs.math.yorku.ca/index.php/SCS_2011:_Statistical_Analysis_and_Programming_with_R SCS R course]<br />
:[[/R packages]]<br />
:[[/SPIDA 20102 preparation]]<br />
__TOC__<br />
== Data scraping ==<br />
* [http://www.r-bloggers.com/preparing-public-data-for-analysis-with-r/ Example from Ministry of Transportation]<br />
== RStudio: Shiny ==<br />
* [http://www.premiersoccerstats.com/wordpress/?p=1273 On Shiny]<br />
* [http://demo.rapporter.net/?sport=ATH-170&weight=0 Rapporter.net]<br />
<br />
== Notes for 6643 ==<br />
* Assignment: Can we produce an estimate of AIC based just on the Wald test?<br />
<br />
== On Pedagogy ==<br />
* [http://www.guardian.co.uk/higher-education-network/blog/2012/oct/18/social-sciences-quantative-skills-training On the importance of quantitative skills in social science]<br />
* http://www.matstat.com/teach/<br />
* [http://www.ma.utexas.edu/users/mks/statmistakes/StatisticsMistakes.html Common Misteaks in Statistics]<br />
=== Advice for students ===<br />
* [http://www.universityaffairs.ca/how-to-ask-for-a-reference-letter.aspx How to ask for a reference letter]<br />
<br />
== Questions (e.g. for survey papers) ==<br />
* Implement more diagnostics in R for lme models<br />
* Explore duality of the whole data matrix<br />
* Extend the UD representation to hyperbola, etc., and include a way of plotting osculation loci<br />
* Explore the geometry of harmonic combinations and its implications for mixed model estimates. What happens as you shift weight from G to <math>(X'X)^{-1}</math>? How does the result wander outside the convex combination? When does it happen and what does it mean?<br />
* Refine Lform and related tools<br />
<br />
== Read ==<br />
* [http://www.ams.org/notices/201001/rtx100100030p.pdf Music: Broken Symmetry,<br />
Geometry, and Complexity]<br />
* https://sites.google.com/site/r4statistics/<br />
<br />
== R course ==<br />
* [https://github.com/hadley/devtools/wiki/ Hadley Wickam Advanced R Development]<br />
* [http://courses.had.co.nz/11-devtools/ Hadley Wickam's R development courses]<br />
Day 2 - add<br />
* final recap of 'lm' interface: subset, na.action, etc., etc.<br />
* discuss formula syntax<br />
* final recap of methods for 'lm'<br />
* note easy extension to 'glm', 'lme', etc.<br />
* note that many 'new' functions do not use this interface, only more 'mature' functions<br />
** lm.formula<br />
* discuss OO showing methods and dispatching<br />
Day 3<br />
* the most useful tools:<br />
** seq<br />
** rep<br />
** replacement functions<br />
* data input<br />
* more programming<br />
** object oriented programming<br />
** using a function in C<br />
* using attributes<br />
* systematic treatment of graphics, including<br />
** par<br />
** xyplot<br />
[[/Day2 Guided Tour of Linear Models.R]]<br />
<br />
== SCS Reads 2011 Links ==<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-I-Simple-v2.pdf Notes on visualizing simple regression]<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-II.r R script for multiple regression]<br />
== Capstone courses ==<br />
* [http://www.amstat.org/publications/jse/v9n1/spurrier.html Course at the U. of South Carolina, 2001]<br />
* <br />
== Links to recent courses ==<br />
*[http://scs.math.yorku.ca/index.php/SCS_2011:_Mixed_Models_with_R SCS 2011: Mixed Models with R]<br />
*[http://statswiki.math.yorku.ca/index.php/SCS:_Mixed_Models_in_R SCS 2010]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2008-09_Practicum_in_Statistical_Consulting MATH 6627 2008-09]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting MATH 6627 2010-11]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2010:_Mixed_Models_with_R SPIDA 2010]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2009:_Mixed_Models_with_R SPIDA 2009]<br />
<br />
*[http://scs.math.yorku.ca/index.php/Spida Spida package]<br />
<br />
== Links to add somewhere ==<br />
* [http://www.ted.com/talks/ben_goldacre_battling_bad_science.html?utm_source=newsletter_weekly_2011-10-04&utm_campaign=newsletter_weekly&utm_medium=email Battling bad science]<br />
*D W Hosmer, S Taber and S Lemeshow () "The importance of assessing the fit of logistic regression models: a case study." ''American Journal of Public Health'', Vol. 81, Issue 12 1630-1635<br />
* [http://www.statmethods.net/ Quick R] for SPSS, SAS and Stata users.<br />
=== Graphics ===<br />
* [http://www.edwardtufte.com/tufte/ ET Modern]<br />
* Striking graphics:<br />
** [http://en.wikipedia.org/wiki/File:U.S._incarceration_rates_1925_onwards.png Incarceration rate in the United States]<br />
<br />
=== Matrices ===<br />
*[http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html Matrix Reference Manual]<br />
*[http://en.wikipedia.org/wiki/Matrix_determinant_lemma Matrix determinant lemma]<br />
*[http://en.wikipedia.org/wiki/Woodbury_matrix_identity Woodbury Matrix Identity]<br />
<br />
=== Simpson's Paradox ===<br />
In the [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979 1979 Canadian federal election] <br />
an unusual event occurred in the Northwest Territories: the Liberals won the popular vote in the territory, but won [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979#National_results neither seat.]<br />
<br />
=== Lee Lorch ===<br />
* [http://aer.sagepub.com/content/36/4/739.abstract Marybeth Gasman (1999)] "Scylla and Charybdis: Navigating the Waters of Academic Freedom at Fisk University During Charles S. Johnson's Administration (1946–1956)" ''American Educational Research Journal''<br />
*: A prominent sociologist and race relations activist, Charles S. Johnson dedicated his life to the advancement of Blacks. His presidency at Fisk University, a historically Black college, was the culmination of his career. During the latter part of his administration, he faced a dilemma involving an outspoken professor named Lee Lorch, who, in 1954, was accused of being a communist. Johnson and the Board of Trustees dismissed Lorch because he refused to answer a congressional committee's questions about his previous political affiliations. In 1959, the American Association of University Professors found the late President Johnson guilty of violating the principles of academic freedom. This article explores the ways in which academic freedom, civil liberties, and civil rights clashed in the Lee Lorch case. Furthermore, it examines the ways in which the setting of a historically Black college alters traditional assumptions about the application of these principles.<br />
* [http://www.nytimes.com/2010/11/22/nyregion/22stuyvesant.html Charles V. Bagli (November 21, 2010), "A New Light on a Fight to Integrate Stuyvesant Town", ''New York Times''.<br />
<br />
== Multilevel Models ==<br />
<br />
=== Expository ===<br />
* [http://glmm.wikidot.com/faq R FAQ]<br />
*[https://perswww.kuleuven.be/~u0018341/documents/ldafda.pdf Verbeke and Molenberghs (2005): Longitudinal Data Analysis Notes]<br />
=== Missing Data ===<br />
* [http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf King et al. (2012) Amelia II]<br />
*<br />
<br />
=== Evaluation ===<br />
* Green, M.J., Medley G.F., & Browne, W.J. (2009). Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression. Veterinary Research, 40(4):30.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675184/pdf/vetres-40-30.pdf<br />
<br />
=== Software for multilevel models ===<br />
{| class="wikitable"<br />
|-<br />
! Package<br />
! Function<br />
! Notes<br />
|-<br />
| R<br />
clmm {ordinal}<br />
| Ordinal response: Fits cumulative link mixed models, i.e. cumulative link models with random effects via the Laplace approximation or the standard and the adaptive Gauss-Hermite quadrature approximation. The functionality in clm is also implemented here. Currently only a single random term is allowed in the location-part of the model.<br />
| <br />
|-<br />
| R: {lme4a}<br />
| Development version of lme4<br />
Download: svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4<br />
<br />
|<br />
|-<br />
|R: {MCMCglmm}<br />
|MCMC Methods for Multi-response Generalized Linear Mixed Models<br />
| <br />
|-<br />
|R: {plm}<br />
|Econometric Analysis of Panel Survey Data<br />
|[http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf Vignette]<br><br />
See p. 3 for comments on first-differencing.<br />
|-<br />
|<br />
|See Snijders and Bosker (2012) for longer list<br />
|<br />
|-<br />
|R: {lme4:nlmm}<br />
|Mon-linear models with lme4<br />
|[http://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/6NLMMH.pdf Presentation by Doug Bates]<br><br />
|}<br />
<br />
== Clones ==<br />
Check for changes and reconcile<br />
* Lab 1<br />
** [[SCS_2011:_Mixed_Models_with_R/Lab_1]]<br />
** [[MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Lab_1]]<br />
<br />
== Read ==<br />
On the age-period-cohort problem:<br />
* see bibliography by Yang: http://home.uchicago.edu/~yangy/research.html<br />
<br />
== Do ==<br />
*[[/spida|spida to do list]]<br />
*[[/p3d|p3d to do list]]<br />
== Read ==<br />
* [http://www.stat.berkeley.edu/~freedman/ Links to recent papers by David Freedman]<br />
* Links to material by Chris Wild:<br />
** http://www.stat.auckland.ac.nz/~wild/StatThink/<br />
** http://www.stat.auckland.ac.nz/showperson.php?uid=wild<br />
* [http://www3.hku.hk/statistics/staff/kaing/ Kai Ng's converse]<br />
<br />
== Notes ==<br />
* [http://www.chrp.org/love/ASACleveland2003Propensity.pdf Good presentation on use of propensity scores]<br />
* [http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/index.html?eref=sihp Simpson's Paradox]<br />
<br />
== R notes ==<br />
* [http://strimmerlab.org/notes/fdr.html False Discovery Rates in R]<br />
=== Items to cover ===<br />
* Wrap up language:<br />
** Selection (give context): indices: index, names, logical, matrix of coordinates, 'subset'<br />
*** Example: dropping NAs from selected variables. Necessary because functions that are most sophisticated methodologically are generally least sophisticated in their interface<br />
**** contrast sophisticated program: lm with unsophisticated lowess<br />
* Using variables in data frames: <br />
** formula oriented functions: xyplot( y ~ x, data = dd )<br />
** explicit: plot( dd$x, dd$y )<br />
** with: with( dd, plot(x,y)); with(dd, xyplot( y ~ x, dd)<br />
** attach: As usual the easiest is deprecated! (why is it only easy and pleasurable things that are ever deprecated)<br />
**: <tt> attach(dd) </tt><br />
**: <tt> plot(x, y) </tt><br />
**: <tt> detach(dd) </tt><br />
*** Problem with 'attach': <br />
**** names in data frame may be masked by names in workspace<br />
**** assignments in workspace not saved in data frame<br />
* Overview of graphics<br />
** Link to http://addictedtor.free.fr/graphiques/<br />
* Programming structures<br />
* Add to graphics:<br />
*:Colours: <tt>pal(grepv('red',colors())); pals() # for all</tt><br />
*:modified tablemissing<br />
==== debugging in R ====<br />
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/debug.shtml<br />
<br />
=== Links ===<br />
*[http://www.stats.ox.ac.uk/pub/MASS4/ MASS 4th ed.] [http://www.stats.ox.ac.uk/pub/MASS4/#Exercises Exercises]<br />
<br />
=== Importing files ===<br />
==== From Excel ====<br />
* Easy: save file in Excel as .csv, then read into R with read.csv<br />
* If you have a lot of files, or get the files from some other sources that edits .xls or .xlsx files:<br />
* The winner: package gdata: <br />
** First install perl. <br />
** read.xls in gdata handles both .xls and .xlsx files<br />
** works on both 32-bit and 64-bit machines<br />
* package XLConnect seems to work only on xlsx files <br />
* the smaller xlsx package also works only xlsx files<br />
* Package xlsReadWrite works on xls files but only on 32-bit systems<br />
* Use xls2csv, a Perl script to convert files to csv first.<br />
<br />
=== Getting lines vs points for different groups in xyplot ===<br />
Ideally, type = c('l','p') would work but it doesn't seem to. So one way is to use type = 'b' with an invisible line for one group and an invisible point for the other:<br />
<br />
library(spida.beta) # also loads 'car'<br />
dd <- Prestige<br />
dd$income.pred <- predict( lm( income ~ education*type, dd), newdata = dd)<br />
td( lty = c(1,0), pch = c(32, 16), lwd = 2) <br />
# lty = 0 produces an invisible line<br />
# and pch = 32 seems to be an invisible point<br />
xyplot( income.pred + income ~ education|type, dd[order(dd$education),], type = 'b',<br />
auto.key = list( columns = 2, lines = T, points = T))<br />
<br />
Also show example using panel.superpose.2<br />
=== Bugs ===<br />
<pre><br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
factor(cut(x, cos, grade, right = FALSE), levels = grade)<br />
}<br />
dg$Grade <- grade( dg$Final )<br />
tab(dg, ~ Grade)<br />
# gets indexing of levels wrong<br />
# the following seems to work correctly<br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
ret <- cut(x, cos, grade, right = FALSE)<br />
factor(ret, levels = grade)<br />
}<br />
</pre> <br />
==== Getting the G matrix in nlme ====<br />
fit <- lme( y ~ x, dd, random = ~1+x |id)<br />
G <- pdMatrix( fit$modelStruct$reStruct)$id<br />
==== Building R packages in 2.14 ====<br />
# Install R<br />
# Install tools: http://robjhyndman.com/researchtips/building-r-packages-for-windows/<br />
<br />
# Info: http://cran.r-project.org/doc/contrib/Graves+DoraiRaj-RPackageDevelopment.pdf<br />
<br />
==== Notes ====<br />
* [http://ipsur.r-forge.r-project.org/book/ IPSUR: Introduction to Probability and Statistics using R]<br />
* [[/test of slash]]<br />
* [[/schedule]]<br />
* [http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=72 Addicted to R Graph Gallery]<br />
* [http://rwiki.sciviews.org/doku.php R Wiki]<br />
* [http://rwiki.sciviews.org/doku.php?id=guides:demos:stata_demo_with_r Stata demo in R]<br />
<br />
== Thumbnail test ==<br />
Here is a graphic file in raw form:<br />
<br />
[[File:UN-missing1.jpg]]<br />
<br />
And here is the same file with a thumbnail:<br />
[[File:UN-missing1.jpg|thumb]]<br />
<br />
== Math check ==<br />
<br />
<font color="red">'''Please click on the 'discussion' tab above'''</font><br />
<br />
:Test how math renders:<br />
<br />
:<math>\begin{align}<br />
f(x) & = (a+b)^2 \\<br />
& = a^2+2ab+b^2 \\<br />
\end{align}</math><br />
<br />
<math> x \perp y </math><br />
<br />
== glmmPQL etc ==<br />
Good discussion between Doug and Ben: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001457.html<br />
== Combining unbiased estimators ==<br />
THis is an example:<br />
* bullet 1<br />
* bullet 2<br />
** again<br />
**: indented <br />
* bullet3<br />
nubmered bullets:<br />
# one<br />
# two<br />
## dkjkdj<br />
##* djkdj<br />
#* djfkd<br />
=== subheading ===<br />
new stuff<br />
==== sub sub ====<br />
more stuff<br />
<br />
<br />
<br />
Let <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> be unbiased estimators of <math>\phi \in {{\mathbb{R}}^{k}}</math> with non-singular variances <math>{{V}_{1}}</math> and <math>{{V}_{2}}</math> respectively.<br />
<br />
Then the minimum variance linear unbiased estimator of<br />
<math>\phi </math> is obtained by combining <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> using weights that are proportional to the inverses of their variances. The result can be expressed in a variety of ways:<br />
<br />
<math>\begin{align}<br />
\hat{\phi } &= {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) \\ <br />
& = {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right)+ & \left[ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}}-{{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}} \right] \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( I+{{V}_{2}}V_{1}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{\left( I+{{V}_{1}}V_{2}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{1}}+{{V}_{1}}V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
The proof is an application of the principle of Generalized Least-Squares. The problem can be formulated as a GLS problem by considering that:<br />
<math>\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]\phi +\left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right]</math> with <math>\operatorname{Var}\left( \left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right] \right)=\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]</math><br />
<br />
Applying the GLS formula yields:<br />
<math>\begin{align}<br />
\hat{\phi } & ={{\left( {{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right] \right)}^{-1}}{{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right] \\ <br />
& ={{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
<br />
== From Nassif Ghoussoub ==<br />
Beware the “useful idiocy” of Mr. Morgan<br />
<br />
The latest commentary of Gwyn Morgan in the Globe and Mail, “If universities were in business, they’d be out of business”, http://www.theglobeandmail.com/report-on-business/commentary/gwyn-morgan/ has crossed another line. Far from being an analysis of the state of Canadian universities, his rant is personal, bitter, demeaning, and insulting to university professors across the country.<br />
<br />
Back in his Globe article of April 29, 2009, “Not all research deserves public funding“, the retired CEO of EnCana Corp. proceeded to rip into the “ivory towers of academia”, attack “esoteric research” and disparage any graduate degree not hailing from medicine or engineering. He also dismissed the 2300 scientists who joined the “Don’t Leave Canada Behind” campaign, which called on the government to include R&D, the lifeblood of the new economy, in its stimulus budget. To its credit, the government of Canada responded positively to the call of its scientists, but from his -dubiously earned- platform at the Globe and Mail, Mr. Morgan kept at it.<br />
<br />
In Saturday’s paper, Mr. Morgan employs sweeping generalizations and ghost statistics to come to the conclusion that, among other things, Canada’s university professors are “poorly prepared” for their lectures, “show up occasionally” to class, and give “poorly thought out assignments”. He claims “the reaction of universities to widespread student dissatisfaction is to blame insufficient financing, rather than their own dysfunction”. He offers that in the new age, formal lectures should be altogether ended.<br />
<br />
His commentary provides neither data about student learning, nor any direct quotations from professors or students. A 1991 study is cited, and then baptized as the truth with a simple "Nineteen years later, little has changed." The article does not attack a particular university, faculty, or teaching method, but rather an apparently archetypal "university professor".<br />
<br />
So what if he hasn't been on campus in 40 years? He knows how it is. Even then, he "stopped going to classes and dedicated his time to learning from textbooks and reviewing friends’ notes". But Mr. Morgan ignores that a professor somewhere, sometime, must have produced and dictated these textbooks and notes. He finds "no reason why all written course material can’t be delivered via the Internet", obviously not aware that since the 90’s, most course material has been made available on the Internet, thanks to dedicated professors. Morgan's suggestion that we replace large classes with "small informal discussions" sounds great, but how does the CEO propose we pay for the much larger number of professors required to do the job? He wants universities to run like businesses, but as one reader suggested: “If Universities were run like the oil and gas industry we would be back in the dark ages where the only skill required would be to count your money... at least until the oil runs out”.<br />
<br />
It is obvious that we embattled post-secondary teachers and researchers need to worry more about the “very useful idiocy” of Mr. Morgan, the permanent platform he has been provided, and the damage that drivel like this can cause to higher education and advanced research in Canada.<br />
<br />
For the Globe and Mail, Mr. Morgan has been pure comic gold for years. Writing on a variety of subjects, ranging from environmental issues and health care, to research and post-secondary education, he has been a bottomless trove of shameless misrepresentations, extreme views and sheer wackiness. But ultimately, this is not only about Gwyn Morgan nor about the Globe and Mail. It is about us.<br />
<br />
It is about Canada’s University Presidents countering his dangerous Tea Party style rhetoric on our post-secondary institutions.<br />
<br />
It is about the Deans of Canada’s Faculties facing up to Mr. Morgan when he writes: “Many qualified applicants are turned away from areas such as engineering and medicine, while universities continue to graduate thousands with knowledge that is neither useful in getting a job, nor in helping our country succeed in a competitive world.”<br />
<br />
It is about the Royal Society of Canada, and other learned societies responding to his views about “esoteric research that doesn’t have the slightest chance of yielding any real value”.<br />
<br />
It is also up to our schools of journalism, to point out to mainstream media the irresponsibility in printing shallow, empty articles full of generalizations and devoid of facts.<br />
<br />
Mr. Morgan may be one of those individuals who get so many things wrong at once that the thought of challenging them or setting the record straight is just too daunting. But it is incumbent upon us not to let his rhetoric negate the exemplary contributions of thousands of Canada’s scholars, teachers and researchers.<br />
<br />
Nassif Ghoussoub, Professor of Mathematics, The University of British Columbia<br />
<br />
== Cell Phones ==<br />
<br />
Date: Sun, 10 Oct 2010 20:02:29 -0400<br />
From: Stuart Newman <newman@NYMC.EDU><br />
Reply-To: Science for the People Discussion List<br />
<SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU><br />
To: SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU<br />
Subject: "Disconnect": Why cellphones may be killing us<br />
<br />
Though I haven't yet read it, this book is presumably not based on<br />
anecdotal evidence. The author, Devra Davis, is the founding director<br />
of the toxicology and environmental studies board at the U.S. National<br />
Academy of Sciences.<br />
<br />
http://tinyurl.com/2fvycxc [Salon.com]<br />
<br />
"Disconnect": Why cellphones may be killing us<br />
A new book probes the connection between mobile devices and a host<br />
of health problems -- with frightening results<br />
By Thomas Rogers<br />
<br />
== Links ==<br />
* [http://www.ted.com/talks/hans_rosling_the_good_news_of_the_decade.html?utm_source=newsletter_weekly_2010-10-12&utm_campaign=newsletter_weekly&utm_medium=email 2010 TED talk by Hans Rosling]<br />
* [http://en.wikipedia.org/wiki/Apophenia Apophenia]<br />
* [http://en.wikipedia.org/wiki/Pareidolia Pareidolia]<br />
<br />
== Notes on mediation ==<br />
The question of mediation is essentially a question about causality. Is the putative mediator, M say, caused by X and, in turn, a cause of Y? But M, in a mediational analysis, cannot have been randomized even if X has been. The question of mediation is essentially a question about causality with observational, not experimental, data.<br />
<br />
To get a perspective on the problem we need to start by considering the general problem of causality with observational data. Let Y be the response variable and let X be the 'target' variable which is seen as a possible 'cause' of Y. For X to cause Y means that the expected value of Y would change in some target experimental condition in which X was manipulated (perhaps through random allocation) while other variables were left untouched -- not necessarily unchanged.<br />
<br />
For causal inference with observational data, we are interested in what would happen under circumstances that are different from those we have actually observed. Our analysis of our observational data will yield an accurate estimate of the causal effect of X if the model for the observational data has the same coefficient for X as it would have if it were applied to data gathered under the target experimental condition. The challenge is to specify and estimate a model that is 'transferable' from the observational condition to the experimental condition. We need a set of concepts to help us critically assess whether a model is transferable. It is not sufficient to have a model that 'fits' well. It may be necessary to include potential confounding factors even if they are not significant in the prediction model for Y. And it may be necessary to exclude strong predictors that are potential mediators -- variables that must not be held constant as one examines the causal relationship between X and Y. One needs a good understanding of the causal model that is valid under experimental conditions in order to properly specify a transferable observational model.<br />
<br />
The problem can be approached in a surprisingly different way, which is the basis for propensity scores. Instead of focusing on a 'transferable' model for Y, one focuses on a model for the assignment of the target causal variable X using potential confounding variables. As in models for Y, it is important to avoid potential mediators between X and Y. However, the model for X based on confounding factors is a prediction model. Confounding variables may be included, raw or transformed, as long as they are predictive of X. It is not necessary to include variables that are not predictive of X. The criterion for developing the model is statistical fit, a criterion that -- apart from the actual selection of confounding predictors -- is empirical, i.e. it is based on the analysis of the data at hand without reference to external theory that is not verifiable with the data. The assignment model need only be valid for the observational condition. Its validity for the experimental condition is irrelevant.<br />
<br />
What are some of the pros and cons of the two approaches? A good transferable model for Y may provide more precise estimates of the effect of X because more of the variability in Y is accounted for in the model. On the other hand, the validity of a causal estimate based on the propensity score approach depends on assumptions that may be much easier to sustain than those required for the approach based on modeling Y. Broadly, the propensity score approach offers lower bias but not necessarily lower variability. Note that the two approaches are not mutually exclusive. They may be better viewed as two sets of concepts that could be combined in an analysis that draws from both.<br />
<br />
How does this all relate to the analysis of mediation? The Baron and Kenny approach and its variants -- in which I include the various ways of estimating direct and indirect causal effects -- are all based on methods analogous to models for Y. As mentioned earlier, estimating the causal effect of the mediator involves causal inference with observational data -- even in the context of an experiment randomizing X. This invites the question whether propensity score methods could be used in assessing more accurately the causal effect of M. The answer lies in the relatively recent theory of principal stratification [Constantine Frangakis and Donald Rubin (2002) "Principal Stratification in Causal Inference", ''Biometrics'', '''58''', 21--29].<br />
<br />
An accessible reference for the concepts behind propensity scores is Donald Rubin (1997) "Estimating Causal Effects from Large Data Sets Using Propensity Scores," ''Annals of Internal Medicine'', '''127''', 757--763.<br />
<br />
A recent treatment of mediation using principal stratification is given in Chapter 8, "Intermediate Causal Factors," of Herbert Weisberg (2010) ''Bias and Causation: Models and Judgment for Valid Comparisons'', Wiley.<br />
<br />
With the large number of seemingly competing approaches to causal inference, students as well as experienced researchers may feel quite puzzled as to which approach they should use. The answer, possibly, is all of them. Each approach seems to shed light on some aspect of the challenge of causal inference in the absence of pristine randomization. They do not offer recipes so much as sets of concepts that can be applied to help understand research projects and analyses.<br />
<br />
== Shock of the New ==<br />
[http://en.wikipedia.org/wiki/Robert_Hughes_(critic) Robert Hughes (1980)]<br />
* [http://www.youtube.com/watch?v=GFn4UmkBcaQ Surrealism Part 1]<br />
* [http://www.youtube.com/watch?v=2uaA8CfZKRs Surrealism Part 2]<br />
* [http://www.youtube.com/watch?v=dActaAa-teM Surrealism Part 3]<br />
* [http://www.youtube.com/watch?v=ZLmCh0xw4h0 Surrealism Part 4]<br />
* [http://www.youtube.com/watch?v=s-SsWPNNBC4 Surrealism Part 5]<br />
== York AODA ==<br />
* [http://aoda.yorku.ca/cs/interactive/en/# AODA training]<br />
== Notes for NATS 1500 ==<br />
* Topics<br />
** single-sex schools?<br />
<br />
== Notes for MATH 6627 ==<br />
* [http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225 Collection of misleading graphs]<br />
* [http://www.amstat.org/sections/cnsl/BooksJournals.cfm ASA consulting page]<br />
* set up student home page<br />
* first assignment. Find and explore a dataset using<br />
** Ernest Kwan's correlagram<br />
** Lattice (use panels and groups)<br />
** p3d<br />
** gapminder<br />
** should have included candisc<br />
* present a 15-minute(crucial) presentation on the data set and on the method<br />
* prepare a wiki page with links and materials <br />
* Address a few questions:<br />
** What are strengths and weaknesses<br />
** For what kind of dataset is it well suited and what kind not?<br />
** Can you find a dataset that illustrates well the features of this approach?<br />
** Can you compare your approach with other approaches?<br />
[[/test page|test]]<br />
<br />
* develop checklists:<br />
* initial exploration of data<br />
* missing data (explicit and implicit)<br />
* do simulation of parallel methods: check estimation of variance parameters<br />
* use nlme to estimate knot placement in gsp<br />
=== Links ===<br />
* [http://healthland.time.com/2011/09/02/mind-reading-why-bad-math-can-ruin-your-health/ Why bad math can ruin your health]<br />
* [http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html TED talk: David McCandless, The Beauty of Data Visualization]<br />
* Hans Rosling<br />
* Great intro to Gapminder: http://www.mrbartonmaths.com/gapminder.htm<br />
<br />
* [http://scs.math.yorku.ca/index.php?title=Special:Contributions&dir=prev&contribs=user&target=Georges Contributions]<br />
* [http://www.scribd.com/doc/17378132/The-Fallacy-of-Personal-Validation-a-Classroom-Demonstration-of-Gullibility The Forer Effect - a classroom example]<br />
<br />
== Notes for R course ==<br />
* Start: It had to be U ... on the SVD [http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I]<br />
* Use SPSS dates both ways to illustrate <br />
** sub using regular expressions<br />
** import: reading dates into 'Date' format using formats: Include all %a %b %Y and others?<br />
** export: writing a date into a character string using format( Date.object, "%d-%m-%Y") to create variable SPSS can read<br />
* Variable references<br />
*: deal with plethora of ways used differently in different places: <br />
*:* formula ( ~id ), good for variables in different roles ( y ~ log(x) + x2 | id)<br />
*:* interpreted in data: (id), good for single var but can use list : (list(x1,x2))<br />
*:*:Examples:<br />
*:*:* <br />
*:* fully reference: dd$id <br />
*Beware:<br />
* aggregate with a formula drops rows with NAs even though the FUN might be able to handle them<br />
* multiple barplot: http://rtricks.wordpress.com/2009/10/26/multbar-advanced-multiple-barplot-with-sem/<br />
=== Add ===<br />
* Discussion of memory issues: what happens when you work on two computers<br />
=== Links ===<br />
* [http://www.r-bloggers.com/r-popularity-%E2%80%93-steady-growth-and-new-york-times/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R blog]<br />
* John Fox: ICPSR: 2010: [http://socserv.mcmaster.ca/jfox/Courses/R-course/Slides-handout.pdf Overview including slides on building R packages]<br />
* [http://www.icpsr.umich.edu/files/sumprog/biblio/2010/Fox.pdf Introduction to the R computing environment]<br />
* ICPSR 2011:<br />
** [http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/index.html Overall page]<br />
**<br />
<br />
== Notes for High School Talks ==<br />
=== Climate change ===<br />
* http://www.cbc.ca/news/technology/story/2011/09/09/pol-climate-adaptation.html<br />
<br />
== Excel techniques ==<br />
* Regular expressions and string substitution<br />
**<br />
* [http://www.contextures.com/xldataval08.html]<br />
<br />
== Ellipse Seminar ==<br />
[[/Ellipse Seminar]]<br />
== Setting up mathstat email in Thunderbird ==<br />
IMAP mailserver: mathstat.yorku.ca Port: 143 Security: STARTTLS<br />
<br />
Outgoing: mathstat.yorku.ca Port: 587 Security: none?<br />
<br />
== Statistical amusement ==<br />
* Statistical Song channel on youtube: http://www.youtube.com/user/StatisticalSongs<br />
** It had to be U ... on the SVD: http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I<br />
** It don't mean a thing if you don't do modelling: http://www.youtube.com/user/StatisticalSongs#p/u/0/Jzm2hrEfNdY<br />
== On Careers in Statistics and Mathematics==<br />
*[https://sites.google.com/site/statsr4us/intro/2-the-joy-of-stats/statsfuture The Joy of Statistics]<br />
*[http://www.bbc.co.uk/news/business-14631547 How Mathematicians Rule the Markets: Quant Trading]<br />
<br />
== On Teaching Science ==<br />
<br />
* [http://www.nats.yorku.ca/index.shtml Natural Science Web Site]<br />
* [https://sites.google.com/site/changingourteaching/Home/tips-for-teaching-asistants Collection of links on teaching]<br />
<br />
<br />
=== A few videos ===<br />
* [http://www.youtube.com/watch?v=ccReLF6M62Y Why Teach Science by James Randi]<br />
* [http://www.youtube.com/watch?v=BlpyGhABXRA Teaching Introductory Physics]<br />
* [http://www.ted.com/talks/brian_goldman_doctors_make_mistakes_can_we_talk_about_that.html?utm_source=newsletter_weekly_2012-01-25&utm_campaign=newsletter_weekly&utm_medium=email Brian Goldman on learning from mistakes]<br />
<br />
== LOG ==<br />
* [[/a]]</div>Georgeshttp://scs.math.yorku.ca/index.php/User:GeorgesUser:Georges2019-03-12T20:03:34Z<p>Georges: </p>
<hr />
<div>__TOC__<br />
== Notes on Mixed Models ==<br />
* varPower(form = ~ fitted(.), fixed = 1)<br />
** The value of 'form' is on the SD scale and 'fixed' by default provides a power to raise 'form' to yield a value proportional to the SD of the response.<br />
** the default level for 'fitted' is the finest level. <br />
== Paradoxes, Fallacies and Other Surprises ==<br />
[[Paradoxes, Fallacies and Other Surprises]]<br />
== Bayes ==<br />
* Two consecutive issues of Statistical Science in 2011 have many interesting article that are related to Bayesian inference:<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059127<br />
** http://www.jstor.org.ezproxy.library.yorku.ca/stable/i23059971<br />
* Experimenting with files:<br />
** jpg file<br />
*** Using the wiki link to the uploaded name: [[File:2013-12-29 18.20.34.jpg|thumb]]<br />
*** Using the wiki link to the uploaded name as media: [[Media:2013-12-29 18.20.34.jpg]]<br />
** .R file<br />
*** FIle wiki link [[File:Tcells.R]]<br />
*** Media wiki link [[Media:Tcells.R]]<br />
* [[Useful formulas]]<br />
* SEM with STAN<br />
** [https://groups.google.com/forum/#!topic/stan-users/dVjm8iES54k Forum discussion re slow convergence]<br />
* Interaction fallacy in a presentation:<br />
*:If you think two variable affect each other then you should include an interaction between them. (Fooled by the word 'interaction').<br />
*[http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf Gelman and Robert on Bayes]<br />
*[http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines MARS: Multivariate Adaptive Regression Splines]<br />
*[http://cran.r-project.org/web/packages/mosaic/vignettes/V2StartTeaching.pdf Teaching with R using MOSAIC by ... and D. Kaplan]<br />
*[http://vudlab.com/simpsons/ Causality: interactive app illustrating Simpson's Paradox]<br />
*[http://www.metafor-project.org/doku.php/tips:testing_factors_lincoms metafor: Tutorial using mixed models for meta analysis]<br />
* [http://andrewgelman.com/2006/06/11/survey_weights/ Andrew Gelman on survey weights with multilevel models]: he suggests unweighted modeling (or a 'variance weighted' analysis, e.g. replication weights) followed by poststratification. <br />
* [https://stat.duke.edu/courses/Fall11/sta101.02/labs/lab1.pdf Intro to R and Rstudio in an intro course at Duke]<br />
* [http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf Intro to R in RStudio]<br />
* [http://www.crcpress.com/product/isbn/9781466515857 Multilevel Modeling Using R]<br />
* [http://cran.r-project.org/doc/contrib/Bliese_Multilevel.pdf Multilevel Modeling in R by Paul Bliese]<br />
* [http://blog.revolutionanalytics.com/2014/08/statistics-losing-ground-to-cs-losing-image-among-students.html Losing ground to CS?]<br />
* [https://www.youtube.com/watch?v=Is1Ej0Vj0Mw Interview with David Smith at UseR 2014]<br />
* [http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/#mapping-variable-values-to-colors On colour]<br />
* [http://cran.r-project.org/web/packages/plot3D/vignettes/plot3D.pdf 3d plotting packages]<br />
* [http://blogs.sas.com/content/iml/2014/08/05/stiglers-seven-pillars-of-statistical-wisdom/ Stigler's Seven Pillars of Statistics]<br />
* [http://glmm.wikidot.com/faq#modelspec FAQ on GLMMs]<br />
* [http://www.math.uah.edu/stat/ Virtual Labs in Probability and Statistics]<br />
* [http://tryr.codeschool.com/ R code school with O'Reilly]<br />
* [http://cran.r-project.org/web/packages/pastecs/pastecs.pdf pastecs package for time series]<br />
* [http://www.bbc.com/news/magazine-28166019 Do doctors understand test results? By William Kremer -- about Gerd Gigerenzer]<br />
* [[/Multiple Testing -- a comment]]<br />
* [[/MOOCs for Data Science]]<br />
* [http://artssquared.wordpress.com/2012/03/21/letter-to-the-provost-and-vp-academic-professor-carl-amrhein/ Arts Squared]<br />
* [[/Using R Markdown]]<br />
* [[/Statistics Links for Courses]]<br />
* [[/Lee Lorch]]<br />
* [http://arxiv.org/pdf/1402.1894v1.pdf Baumer et al. (2014) Using R Markdown in Intro Stats]<br />
* [[/Job ratings]]<br />
* [http://www.stat.cmu.edu/~hseltman/PIER/Bayes/data/schools.R Using WinBUGS on the Netherlands data]<br />
* [[/Climate Change|/Climate Change]]<br />
* [[/Standardize or Not]]<br />
*[[/Mixed Models -- papers]]<br />
*[[/MCMCglmm]]<br />
*[[/Wiki tests]]<br />
<br />
== Cause, correlation, or ... ==<br />
*[http://p.nytimes.com/email/re?location=4z5Q7LhI+KUPcT7snurzN09anQA2MM49IhNWGFarU5GcvOIXzFz0cSuazUKJK97uTp6+uuRfTEhO+dnMZordtiC8Du17IY2zzXWzY7etdKdkq0H3sffQdSh+6YpVRsJcs8ZeVrfzShCpIEB5wXVC1g==&campaign_id=23&instance_id=45715&segment_id=62874&user_id=ecb65bd8a646c4ab6214f51d21246fce&regi_id=7495274 Instant noodles and metabolic syndrome]<br />
== Notepad ==<br />
* [http://www.r-bloggers.com/some-r-resources-for-glms/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R resources for GLMs]<br />
* [https://www.stat.auckland.ac.nz/~ihaka/120/Lectures/lecture17.pdf On Mosaic plots]<br />
* [[/Academic and Administrative Program Review]]<br />
* [[/Statistics programs]]<br />
* [http://www.yorku.ca/careers/gpse/2013/ Careers Expo at York] with 13 booths from UofT such as Dalla Lana but only one generic FGS booth from York.<br />
* [https://www.researchgate.net/publication/257516014_One_paradox_in_statistical_decision_making#! Schervish's p-value paradox]<br />
* [[/Data]]<br />
* [http://www.universityaffairs.ca/course-evaluations-the-good-the-bad-and-the-ugly.aspx Course evaluations: the good, the bad and the ugly]<br />
* Larry Wasserman on<br />
** [http://normaldeviate.wordpress.com/2013/04/27/the-perils-of-hypothesis-testing-again/ Perils of Hypothesis Testing]<br />
**[http://normaldeviate.wordpress.com/2013/04/13/data-science-the-end-of-statistics/ Data Science]<br />
**[http://normaldeviate.wordpress.com/2012/06/18/48/ Causality]<br />
*[[/HOA]]<br />
*[[/Big data]]<br />
*[[/Student satisfaction]]<br />
*[http://normaldeviate.wordpress.com/2012/12/21/guest-post-rob-tibshirani/ Rob Tibshirani's list of 9 great statistics papers]<br />
*[http://www.newyorker.com/online/blogs/johncassidy/2013/04/the-rogoff-and-reinhart-controversy-a-summing-up.html?mobify=0 Cassidy on the Reinhart-Rogoff controversy.]<br />
*[http://clinicaltrials.gov/ Clinical Trials registry in the US]<br />
*[http://en.wikipedia.org/wiki/Cochrane_Collaboration The Cochrane Collection]<br />
* 2004 ICMJE: policy of registration:<br />
<br />
__TOC__<br />
Recommended sources on statistics:<br />
<br />
There are many excellent sources for information on current statistical issues (Psychonomic Society Journals):<br />
<br />
* Confidence Intervals:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci).<br />
** Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Expérimentale, 57, 203-220. doi:10.1037/h0087426<br />
* Effect Size Estimates:<br />
** Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis and the interpretation of research results. Cambridge University Press. ISBN 978-0-521-14246-5.<br />
** Fritz, C. O., Morris, P. E., & Richler, J. J. (2011). Effect size estimates: Current use, calculations and interpretation. Journal of Experimental Psychology: General, 141, 2-18.<br />
** Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York, NY: Routledge/Taylor & Francis Group.<br />
* Meta-analysis:<br />
** Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY US: Routledge/Taylor & Francis Group. (see www.latrobe.edu.au/psy/research/projects/esci ).<br />
** Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis. New York: Oxford University Press.<br />
* Bayesian Data Analysis:<br />
** Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R and BUGS. San Diego, CA: Elsevier Academic Press. (See www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/)<br />
** Kruschke, J. K. (in press). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General. (For a preprint see http://www.indiana.edu/~kruschke/BEST/BEST.pdf).<br />
* Power Analysis:<br />
** Faul, F., Erdfelder, E., Lang, A., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191. (See http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/)<br />
=== Blogs ===<br />
* [http://jeromyanglim.blogspot.ca Jeromy Anglim]<br />
<br />
== Pythagoras Diagram ==<br />
* [http://pages.uoregon.edu/stevensj/MRA/partial.pdf Venn diagram 'fallacy' example]<br />
<br />
==Recent changes==<br />
[[/Links|Links]]<br><br />
[[/Recent Changes]] [[/Contributions]]<br><br />
[[/DO]]<br />
==Topics==<br />
* [[/R packages]]<br />
* [[/Curriculum]]<br />
* [[/HLM links]]<br />
* [[/Education links]]<br />
* Death of Evidence<br />
** [http://www.nature.com/nature/journal/v483/n7387/full/483006a.html Article in Nature: Frozen Out, March 1, 2012]<br />
** [http://www.deathofevidence.ca/ Death of Evidence website]<br />
* [[/Mixed effects for multinomial responses]]<br />
* [[/Ellipse paper comments]]<br />
* On Tobacco (from Matt)<br />
** [http://io9.com/5899612/low+income-countries-are-a-cigarettes-best-friend Low income countries and tobacco]<br />
** [http://www.tobaccoatlas.org/uploads/Images/PDFs/Tobacco_Atlas_4_entire.pdf The Tobacco Atlas]<br />
*[[/fda.R]]<br />
*[[/FSE Scholars evening]]<br />
*[[/MATH 6627 student contributions]] <br />
:[http://scs.math.yorku.ca/index.php?title=Special:UserLogin&type=signup Create new account]<br />
:[http://scs.math.yorku.ca/index.php/SCS_2011:_Statistical_Analysis_and_Programming_with_R SCS R course]<br />
:[[/R packages]]<br />
:[[/SPIDA 20102 preparation]]<br />
__TOC__<br />
== Data scraping ==<br />
* [http://www.r-bloggers.com/preparing-public-data-for-analysis-with-r/ Example from Ministry of Transportation]<br />
== RStudio: Shiny ==<br />
* [http://www.premiersoccerstats.com/wordpress/?p=1273 On Shiny]<br />
* [http://demo.rapporter.net/?sport=ATH-170&weight=0 Rapporter.net]<br />
<br />
== Notes for 6643 ==<br />
* Assignment: Can we produce an estimate of AIC based just on the Wald test?<br />
<br />
== On Pedagogy ==<br />
* [http://www.guardian.co.uk/higher-education-network/blog/2012/oct/18/social-sciences-quantative-skills-training On the importance of quantitative skills in social science]<br />
* http://www.matstat.com/teach/<br />
* [http://www.ma.utexas.edu/users/mks/statmistakes/StatisticsMistakes.html Common Misteaks in Statistics]<br />
=== Advice for students ===<br />
* [http://www.universityaffairs.ca/how-to-ask-for-a-reference-letter.aspx How to ask for a reference letter]<br />
<br />
== Questions (e.g. for survey papers) ==<br />
* Implement more diagnostics in R for lme models<br />
* Explore duality of the whole data matrix<br />
* Extend the UD representation to hyperbola, etc., and include a way of plotting osculation loci<br />
* Explore the geometry of harmonic combinations and its implications for mixed model estimates. What happens as you shift weight from G to <math>(X'X)^{-1}</math>? How does the result wander outside the convex combination? When does it happen and what does it mean?<br />
* Refine Lform and related tools<br />
<br />
== Read ==<br />
* [http://www.ams.org/notices/201001/rtx100100030p.pdf Music: Broken Symmetry,<br />
Geometry, and Complexity]<br />
* https://sites.google.com/site/r4statistics/<br />
<br />
== R course ==<br />
* [https://github.com/hadley/devtools/wiki/ Hadley Wickam Advanced R Development]<br />
* [http://courses.had.co.nz/11-devtools/ Hadley Wickam's R development courses]<br />
Day 2 - add<br />
* final recap of 'lm' interface: subset, na.action, etc., etc.<br />
* discuss formula syntax<br />
* final recap of methods for 'lm'<br />
* note easy extension to 'glm', 'lme', etc.<br />
* note that many 'new' functions do not use this interface, only more 'mature' functions<br />
** lm.formula<br />
* discuss OO showing methods and dispatching<br />
Day 3<br />
* the most useful tools:<br />
** seq<br />
** rep<br />
** replacement functions<br />
* data input<br />
* more programming<br />
** object oriented programming<br />
** using a function in C<br />
* using attributes<br />
* systematic treatment of graphics, including<br />
** par<br />
** xyplot<br />
[[/Day2 Guided Tour of Linear Models.R]]<br />
<br />
== SCS Reads 2011 Links ==<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-I-Simple-v2.pdf Notes on visualizing simple regression]<br />
* [http://www.math.yorku.ca/people/georges/Files/Ellipse_Seminar/Visualizing_Regression-II.r R script for multiple regression]<br />
== Capstone courses ==<br />
* [http://www.amstat.org/publications/jse/v9n1/spurrier.html Course at the U. of South Carolina, 2001]<br />
* <br />
== Links to recent courses ==<br />
*[http://scs.math.yorku.ca/index.php/SCS_2011:_Mixed_Models_with_R SCS 2011: Mixed Models with R]<br />
*[http://statswiki.math.yorku.ca/index.php/SCS:_Mixed_Models_in_R SCS 2010]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2008-09_Practicum_in_Statistical_Consulting MATH 6627 2008-09]<br />
*[http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting MATH 6627 2010-11]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2010:_Mixed_Models_with_R SPIDA 2010]<br />
*[http://wiki.math.yorku.ca/index.php/SPIDA_2009:_Mixed_Models_with_R SPIDA 2009]<br />
<br />
*[http://scs.math.yorku.ca/index.php/Spida Spida package]<br />
<br />
== Links to add somewhere ==<br />
* [http://www.ted.com/talks/ben_goldacre_battling_bad_science.html?utm_source=newsletter_weekly_2011-10-04&utm_campaign=newsletter_weekly&utm_medium=email Battling bad science]<br />
*D W Hosmer, S Taber and S Lemeshow () "The importance of assessing the fit of logistic regression models: a case study." ''American Journal of Public Health'', Vol. 81, Issue 12 1630-1635<br />
* [http://www.statmethods.net/ Quick R] for SPSS, SAS and Stata users.<br />
=== Graphics ===<br />
* [http://www.edwardtufte.com/tufte/ ET Modern]<br />
* Striking graphics:<br />
** [http://en.wikipedia.org/wiki/File:U.S._incarceration_rates_1925_onwards.png Incarceration rate in the United States]<br />
<br />
=== Matrices ===<br />
*[http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html Matrix Reference Manual]<br />
*[http://en.wikipedia.org/wiki/Matrix_determinant_lemma Matrix determinant lemma]<br />
*[http://en.wikipedia.org/wiki/Woodbury_matrix_identity Woodbury Matrix Identity]<br />
<br />
=== Simpson's Paradox ===<br />
In the [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979 1979 Canadian federal election] <br />
an unusual event occurred in the Northwest Territories: the Liberals won the popular vote in the territory, but won [http://en.wikipedia.org/wiki/Canadian_federal_election,_1979#National_results neither seat.]<br />
<br />
=== Lee Lorch ===<br />
* [http://aer.sagepub.com/content/36/4/739.abstract Marybeth Gasman (1999)] "Scylla and Charybdis: Navigating the Waters of Academic Freedom at Fisk University During Charles S. Johnson's Administration (1946–1956)" ''American Educational Research Journal''<br />
*: A prominent sociologist and race relations activist, Charles S. Johnson dedicated his life to the advancement of Blacks. His presidency at Fisk University, a historically Black college, was the culmination of his career. During the latter part of his administration, he faced a dilemma involving an outspoken professor named Lee Lorch, who, in 1954, was accused of being a communist. Johnson and the Board of Trustees dismissed Lorch because he refused to answer a congressional committee's questions about his previous political affiliations. In 1959, the American Association of University Professors found the late President Johnson guilty of violating the principles of academic freedom. This article explores the ways in which academic freedom, civil liberties, and civil rights clashed in the Lee Lorch case. Furthermore, it examines the ways in which the setting of a historically Black college alters traditional assumptions about the application of these principles.<br />
* [http://www.nytimes.com/2010/11/22/nyregion/22stuyvesant.html Charles V. Bagli (November 21, 2010), "A New Light on a Fight to Integrate Stuyvesant Town", ''New York Times''.<br />
<br />
== Multilevel Models ==<br />
<br />
=== Expository ===<br />
* [http://glmm.wikidot.com/faq R FAQ]<br />
*[https://perswww.kuleuven.be/~u0018341/documents/ldafda.pdf Verbeke and Molenberghs (2005): Longitudinal Data Analysis Notes]<br />
=== Missing Data ===<br />
* [http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf King et al. (2012) Amelia II]<br />
*<br />
<br />
=== Evaluation ===<br />
* Green, M.J., Medley G.F., & Browne, W.J. (2009). Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression. Veterinary Research, 40(4):30.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2675184/pdf/vetres-40-30.pdf<br />
<br />
=== Software for multilevel models ===<br />
{| class="wikitable"<br />
|-<br />
! Package<br />
! Function<br />
! Notes<br />
|-<br />
| R<br />
clmm {ordinal}<br />
| Ordinal response: Fits cumulative link mixed models, i.e. cumulative link models with random effects via the Laplace approximation or the standard and the adaptive Gauss-Hermite quadrature approximation. The functionality in clm is also implemented here. Currently only a single random term is allowed in the location-part of the model.<br />
| <br />
|-<br />
| R: {lme4a}<br />
| Development version of lme4<br />
Download: svn checkout svn://svn.r-forge.r-project.org/svnroot/lme4<br />
<br />
|<br />
|-<br />
|R: {MCMCglmm}<br />
|MCMC Methods for Multi-response Generalized Linear Mixed Models<br />
| <br />
|-<br />
|R: {plm}<br />
|Econometric Analysis of Panel Survey Data<br />
|[http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf Vignette]<br><br />
See p. 3 for comments on first-differencing.<br />
|-<br />
|<br />
|See Snijders and Bosker (2012) for longer list<br />
|<br />
|-<br />
|R: {lme4:nlmm}<br />
|Mon-linear models with lme4<br />
|[http://lme4.r-forge.r-project.org/slides/2011-01-11-Madison/6NLMMH.pdf Presentation by Doug Bates]<br><br />
|}<br />
<br />
== Clones ==<br />
Check for changes and reconcile<br />
* Lab 1<br />
** [[SCS_2011:_Mixed_Models_with_R/Lab_1]]<br />
** [[MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Lab_1]]<br />
<br />
== Read ==<br />
On the age-period-cohort problem:<br />
* see bibliography by Yang: http://home.uchicago.edu/~yangy/research.html<br />
<br />
== Do ==<br />
*[[/spida|spida to do list]]<br />
*[[/p3d|p3d to do list]]<br />
== Read ==<br />
* [http://www.stat.berkeley.edu/~freedman/ Links to recent papers by David Freedman]<br />
* Links to material by Chris Wild:<br />
** http://www.stat.auckland.ac.nz/~wild/StatThink/<br />
** http://www.stat.auckland.ac.nz/showperson.php?uid=wild<br />
* [http://www3.hku.hk/statistics/staff/kaing/ Kai Ng's converse]<br />
<br />
== Notes ==<br />
* [http://www.chrp.org/love/ASACleveland2003Propensity.pdf Good presentation on use of propensity scores]<br />
* [http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/index.html?eref=sihp Simpson's Paradox]<br />
<br />
== R notes ==<br />
* [http://strimmerlab.org/notes/fdr.html False Discovery Rates in R]<br />
=== Items to cover ===<br />
* Wrap up language:<br />
** Selection (give context): indices: index, names, logical, matrix of coordinates, 'subset'<br />
*** Example: dropping NAs from selected variables. Necessary because functions that are most sophisticated methodologically are generally least sophisticated in their interface<br />
**** contrast sophisticated program: lm with unsophisticated lowess<br />
* Using variables in data frames: <br />
** formula oriented functions: xyplot( y ~ x, data = dd )<br />
** explicit: plot( dd$x, dd$y )<br />
** with: with( dd, plot(x,y)); with(dd, xyplot( y ~ x, dd)<br />
** attach: As usual the easiest is deprecated! (why is it only easy and pleasurable things that are ever deprecated)<br />
**: <tt> attach(dd) </tt><br />
**: <tt> plot(x, y) </tt><br />
**: <tt> detach(dd) </tt><br />
*** Problem with 'attach': <br />
**** names in data frame may be masked by names in workspace<br />
**** assignments in workspace not saved in data frame<br />
* Overview of graphics<br />
** Link to http://addictedtor.free.fr/graphiques/<br />
* Programming structures<br />
* Add to graphics:<br />
*:Colours: <tt>pal(grepv('red',colors())); pals() # for all</tt><br />
*:modified tablemissing<br />
==== debugging in R ====<br />
* http://www.stats.uwo.ca/faculty/murdoch/software/debuggingR/debug.shtml<br />
<br />
=== Links ===<br />
*[http://www.stats.ox.ac.uk/pub/MASS4/ MASS 4th ed.] [http://www.stats.ox.ac.uk/pub/MASS4/#Exercises Exercises]<br />
<br />
=== Importing files ===<br />
==== From Excel ====<br />
* Easy: save file in Excel as .csv, then read into R with read.csv<br />
* If you have a lot of files, or get the files from some other sources that edits .xls or .xlsx files:<br />
* The winner: package gdata: <br />
** First install perl. <br />
** read.xls in gdata handles both .xls and .xlsx files<br />
** works on both 32-bit and 64-bit machines<br />
* package XLConnect seems to work only on xlsx files <br />
* the smaller xlsx package also works only xlsx files<br />
* Package xlsReadWrite works on xls files but only on 32-bit systems<br />
* Use xls2csv, a Perl script to convert files to csv first.<br />
<br />
=== Getting lines vs points for different groups in xyplot ===<br />
Ideally, type = c('l','p') would work but it doesn't seem to. So one way is to use type = 'b' with an invisible line for one group and an invisible point for the other:<br />
<br />
library(spida.beta) # also loads 'car'<br />
dd <- Prestige<br />
dd$income.pred <- predict( lm( income ~ education*type, dd), newdata = dd)<br />
td( lty = c(1,0), pch = c(32, 16), lwd = 2) <br />
# lty = 0 produces an invisible line<br />
# and pch = 32 seems to be an invisible point<br />
xyplot( income.pred + income ~ education|type, dd[order(dd$education),], type = 'b',<br />
auto.key = list( columns = 2, lines = T, points = T))<br />
<br />
Also show example using panel.superpose.2<br />
=== Bugs ===<br />
<pre><br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
factor(cut(x, cos, grade, right = FALSE), levels = grade)<br />
}<br />
dg$Grade <- grade( dg$Final )<br />
tab(dg, ~ Grade)<br />
# gets indexing of levels wrong<br />
# the following seems to work correctly<br />
grade <- function(x ,<br />
cos = c(-Inf,40,50,55,60,65,70,75,80,90,Inf) - 0,<br />
grade = c("F","E","D","D+","C","C+","B","B+","A","A+")) {<br />
ret <- cut(x, cos, grade, right = FALSE)<br />
factor(ret, levels = grade)<br />
}<br />
</pre> <br />
==== Getting the G matrix in nlme ====<br />
fit <- lme( y ~ x, dd, random = ~1+x |id)<br />
G <- pdMatrix( fit$modelStruct$reStruct)$id<br />
==== Building R packages in 2.14 ====<br />
# Install R<br />
# Install tools: http://robjhyndman.com/researchtips/building-r-packages-for-windows/<br />
<br />
# Info: http://cran.r-project.org/doc/contrib/Graves+DoraiRaj-RPackageDevelopment.pdf<br />
<br />
==== Notes ====<br />
* [http://ipsur.r-forge.r-project.org/book/ IPSUR: Introduction to Probability and Statistics using R]<br />
* [[/test of slash]]<br />
* [[/schedule]]<br />
* [http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=72 Addicted to R Graph Gallery]<br />
* [http://rwiki.sciviews.org/doku.php R Wiki]<br />
* [http://rwiki.sciviews.org/doku.php?id=guides:demos:stata_demo_with_r Stata demo in R]<br />
<br />
== Thumbnail test ==<br />
Here is a graphic file in raw form:<br />
<br />
[[File:UN-missing1.jpg]]<br />
<br />
And here is the same file with a thumbnail:<br />
[[File:UN-missing1.jpg|thumb]]<br />
<br />
== Math check ==<br />
<br />
<font color="red">'''Please click on the 'discussion' tab above'''</font><br />
<br />
:Test how math renders:<br />
<br />
:<math>\begin{align}<br />
f(x) & = (a+b)^2 \\<br />
& = a^2+2ab+b^2 \\<br />
\end{align}</math><br />
<br />
<math> x \perp y </math><br />
<br />
== glmmPQL etc ==<br />
Good discussion between Doug and Ben: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001457.html<br />
== Combining unbiased estimators ==<br />
THis is an example:<br />
* bullet 1<br />
* bullet 2<br />
** again<br />
**: indented <br />
* bullet3<br />
nubmered bullets:<br />
# one<br />
# two<br />
## dkjkdj<br />
##* djkdj<br />
#* djfkd<br />
=== subheading ===<br />
new stuff<br />
==== sub sub ====<br />
more stuff<br />
<br />
<br />
<br />
Let <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> be unbiased estimators of <math>\phi \in {{\mathbb{R}}^{k}}</math> with non-singular variances <math>{{V}_{1}}</math> and <math>{{V}_{2}}</math> respectively.<br />
<br />
Then the minimum variance linear unbiased estimator of<br />
<math>\phi </math> is obtained by combining <math>{{\hat{\phi }}_{1}}</math> and <math>{{\hat{\phi }}_{2}}</math> using weights that are proportional to the inverses of their variances. The result can be expressed in a variety of ways:<br />
<br />
<math>\begin{align}<br />
\hat{\phi } &= {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) \\ <br />
& = {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right)+ & \left[ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}}-{{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}{{{\hat{\phi }}}_{1}} \right] \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}V_{2}^{-1}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{{\hat{\phi }}}_{1}}+ {{\left( I+{{V}_{2}}V_{1}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{2}}-{{{\hat{\phi }}}_{1}} \right) \\ <br />
& = {{\left( I+{{V}_{1}}V_{2}^{-1} \right)}^{-1}}\left( {{{\hat{\phi }}}_{1}}+{{V}_{1}}V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
The proof is an application of the principle of Generalized Least-Squares. The problem can be formulated as a GLS problem by considering that:<br />
<math>\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right]=\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]\phi +\left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right]</math> with <math>\operatorname{Var}\left( \left[ \begin{matrix}<br />
{{\varepsilon }_{1}} \\<br />
{{\varepsilon }_{1}} \\<br />
\end{matrix} \right] \right)=\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]</math><br />
<br />
Applying the GLS formula yields:<br />
<math>\begin{align}<br />
\hat{\phi } & ={{\left( {{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right] \right)}^{-1}}{{\left[ \begin{matrix}<br />
I \\<br />
I \\<br />
\end{matrix} \right]}^{\prime }}{{\left[ \begin{matrix}<br />
{{V}_{1}} & 0 \\<br />
0 & {{V}_{2}} \\<br />
\end{matrix} \right]}^{-1}}\left[ \begin{matrix}<br />
{{{\hat{\phi }}}_{1}} \\<br />
{{{\hat{\phi }}}_{2}} \\<br />
\end{matrix} \right] \\ <br />
& ={{\left( V_{1}^{-1}+V_{2}^{-1} \right)}^{-1}}\left( V_{1}^{-1}{{{\hat{\phi }}}_{1}}+V_{2}^{-1}{{{\hat{\phi }}}_{2}} \right) <br />
\end{align}</math><br />
<br />
== From Nassif Ghoussoub ==<br />
Beware the “useful idiocy” of Mr. Morgan<br />
<br />
The latest commentary of Gwyn Morgan in the Globe and Mail, “If universities were in business, they’d be out of business”, http://www.theglobeandmail.com/report-on-business/commentary/gwyn-morgan/ has crossed another line. Far from being an analysis of the state of Canadian universities, his rant is personal, bitter, demeaning, and insulting to university professors across the country.<br />
<br />
Back in his Globe article of April 29, 2009, “Not all research deserves public funding“, the retired CEO of EnCana Corp. proceeded to rip into the “ivory towers of academia”, attack “esoteric research” and disparage any graduate degree not hailing from medicine or engineering. He also dismissed the 2300 scientists who joined the “Don’t Leave Canada Behind” campaign, which called on the government to include R&D, the lifeblood of the new economy, in its stimulus budget. To its credit, the government of Canada responded positively to the call of its scientists, but from his -dubiously earned- platform at the Globe and Mail, Mr. Morgan kept at it.<br />
<br />
In Saturday’s paper, Mr. Morgan employs sweeping generalizations and ghost statistics to come to the conclusion that, among other things, Canada’s university professors are “poorly prepared” for their lectures, “show up occasionally” to class, and give “poorly thought out assignments”. He claims “the reaction of universities to widespread student dissatisfaction is to blame insufficient financing, rather than their own dysfunction”. He offers that in the new age, formal lectures should be altogether ended.<br />
<br />
His commentary provides neither data about student learning, nor any direct quotations from professors or students. A 1991 study is cited, and then baptized as the truth with a simple "Nineteen years later, little has changed." The article does not attack a particular university, faculty, or teaching method, but rather an apparently archetypal "university professor".<br />
<br />
So what if he hasn't been on campus in 40 years? He knows how it is. Even then, he "stopped going to classes and dedicated his time to learning from textbooks and reviewing friends’ notes". But Mr. Morgan ignores that a professor somewhere, sometime, must have produced and dictated these textbooks and notes. He finds "no reason why all written course material can’t be delivered via the Internet", obviously not aware that since the 90’s, most course material has been made available on the Internet, thanks to dedicated professors. Morgan's suggestion that we replace large classes with "small informal discussions" sounds great, but how does the CEO propose we pay for the much larger number of professors required to do the job? He wants universities to run like businesses, but as one reader suggested: “If Universities were run like the oil and gas industry we would be back in the dark ages where the only skill required would be to count your money... at least until the oil runs out”.<br />
<br />
It is obvious that we embattled post-secondary teachers and researchers need to worry more about the “very useful idiocy” of Mr. Morgan, the permanent platform he has been provided, and the damage that drivel like this can cause to higher education and advanced research in Canada.<br />
<br />
For the Globe and Mail, Mr. Morgan has been pure comic gold for years. Writing on a variety of subjects, ranging from environmental issues and health care, to research and post-secondary education, he has been a bottomless trove of shameless misrepresentations, extreme views and sheer wackiness. But ultimately, this is not only about Gwyn Morgan nor about the Globe and Mail. It is about us.<br />
<br />
It is about Canada’s University Presidents countering his dangerous Tea Party style rhetoric on our post-secondary institutions.<br />
<br />
It is about the Deans of Canada’s Faculties facing up to Mr. Morgan when he writes: “Many qualified applicants are turned away from areas such as engineering and medicine, while universities continue to graduate thousands with knowledge that is neither useful in getting a job, nor in helping our country succeed in a competitive world.”<br />
<br />
It is about the Royal Society of Canada, and other learned societies responding to his views about “esoteric research that doesn’t have the slightest chance of yielding any real value”.<br />
<br />
It is also up to our schools of journalism, to point out to mainstream media the irresponsibility in printing shallow, empty articles full of generalizations and devoid of facts.<br />
<br />
Mr. Morgan may be one of those individuals who get so many things wrong at once that the thought of challenging them or setting the record straight is just too daunting. But it is incumbent upon us not to let his rhetoric negate the exemplary contributions of thousands of Canada’s scholars, teachers and researchers.<br />
<br />
Nassif Ghoussoub, Professor of Mathematics, The University of British Columbia<br />
<br />
== Cell Phones ==<br />
<br />
Date: Sun, 10 Oct 2010 20:02:29 -0400<br />
From: Stuart Newman <newman@NYMC.EDU><br />
Reply-To: Science for the People Discussion List<br />
<SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU><br />
To: SCIENCE-FOR-THE-PEOPLE@LIST.UVM.EDU<br />
Subject: "Disconnect": Why cellphones may be killing us<br />
<br />
Though I haven't yet read it, this book is presumably not based on<br />
anecdotal evidence. The author, Devra Davis, is the founding director<br />
of the toxicology and environmental studies board at the U.S. National<br />
Academy of Sciences.<br />
<br />
http://tinyurl.com/2fvycxc [Salon.com]<br />
<br />
"Disconnect": Why cellphones may be killing us<br />
A new book probes the connection between mobile devices and a host<br />
of health problems -- with frightening results<br />
By Thomas Rogers<br />
<br />
== Links ==<br />
* [http://www.ted.com/talks/hans_rosling_the_good_news_of_the_decade.html?utm_source=newsletter_weekly_2010-10-12&utm_campaign=newsletter_weekly&utm_medium=email 2010 TED talk by Hans Rosling]<br />
* [http://en.wikipedia.org/wiki/Apophenia Apophenia]<br />
* [http://en.wikipedia.org/wiki/Pareidolia Pareidolia]<br />
<br />
== Notes on mediation ==<br />
The question of mediation is essentially a question about causality. Is the putative mediator, M say, caused by X and, in turn, a cause of Y? But M, in a mediational analysis, cannot have been randomized even if X has been. The question of mediation is essentially a question about causality with observational, not experimental, data.<br />
<br />
To get a perspective on the problem we need to start by considering the general problem of causality with observational data. Let Y be the response variable and let X be the 'target' variable which is seen as a possible 'cause' of Y. For X to cause Y means that the expected value of Y would change in some target experimental condition in which X was manipulated (perhaps through random allocation) while other variables were left untouched -- not necessarily unchanged.<br />
<br />
For causal inference with observational data, we are interested in what would happen under circumstances that are different from those we have actually observed. Our analysis of our observational data will yield an accurate estimate of the causal effect of X if the model for the observational data has the same coefficient for X as it would have if it were applied to data gathered under the target experimental condition. The challenge is to specify and estimate a model that is 'transferable' from the observational condition to the experimental condition. We need a set of concepts to help us critically assess whether a model is transferable. It is not sufficient to have a model that 'fits' well. It may be necessary to include potential confounding factors even if they are not significant in the prediction model for Y. And it may be necessary to exclude strong predictors that are potential mediators -- variables that must not be held constant as one examines the causal relationship between X and Y. One needs a good understanding of the causal model that is valid under experimental conditions in order to properly specify a transferable observational model.<br />
<br />
The problem can be approached in a surprisingly different way, which is the basis for propensity scores. Instead of focusing on a 'transferable' model for Y, one focuses on a model for the assignment of the target causal variable X using potential confounding variables. As in models for Y, it is important to avoid potential mediators between X and Y. However, the model for X based on confounding factors is a prediction model. Confounding variables may be included, raw or transformed, as long as they are predictive of X. It is not necessary to include variables that are not predictive of X. The criterion for developing the model is statistical fit, a criterion that -- apart from the actual selection of confounding predictors -- is empirical, i.e. it is based on the analysis of the data at hand without reference to external theory that is not verifiable with the data. The assignment model need only be valid for the observational condition. Its validity for the experimental condition is irrelevant.<br />
<br />
What are some of the pros and cons of the two approaches? A good transferable model for Y may provide more precise estimates of the effect of X because more of the variability in Y is accounted for in the model. On the other hand, the validity of a causal estimate based on the propensity score approach depends on assumptions that may be much easier to sustain than those required for the approach based on modeling Y. Broadly, the propensity score approach offers lower bias but not necessarily lower variability. Note that the two approaches are not mutually exclusive. They may be better viewed as two sets of concepts that could be combined in an analysis that draws from both.<br />
<br />
How does this all relate to the analysis of mediation? The Baron and Kenny approach and its variants -- in which I include the various ways of estimating direct and indirect causal effects -- are all based on methods analogous to models for Y. As mentioned earlier, estimating the causal effect of the mediator involves causal inference with observational data -- even in the context of an experiment randomizing X. This invites the question whether propensity score methods could be used in assessing more accurately the causal effect of M. The answer lies in the relatively recent theory of principal stratification [Constantine Frangakis and Donald Rubin (2002) "Principal Stratification in Causal Inference", ''Biometrics'', '''58''', 21--29].<br />
<br />
An accessible reference for the concepts behind propensity scores is Donald Rubin (1997) "Estimating Causal Effects from Large Data Sets Using Propensity Scores," ''Annals of Internal Medicine'', '''127''', 757--763.<br />
<br />
A recent treatment of mediation using principal stratification is given in Chapter 8, "Intermediate Causal Factors," of Herbert Weisberg (2010) ''Bias and Causation: Models and Judgment for Valid Comparisons'', Wiley.<br />
<br />
With the large number of seemingly competing approaches to causal inference, students as well as experienced researchers may feel quite puzzled as to which approach they should use. The answer, possibly, is all of them. Each approach seems to shed light on some aspect of the challenge of causal inference in the absence of pristine randomization. They do not offer recipes so much as sets of concepts that can be applied to help understand research projects and analyses.<br />
<br />
== Shock of the New ==<br />
[http://en.wikipedia.org/wiki/Robert_Hughes_(critic) Robert Hughes (1980)]<br />
* [http://www.youtube.com/watch?v=GFn4UmkBcaQ Surrealism Part 1]<br />
* [http://www.youtube.com/watch?v=2uaA8CfZKRs Surrealism Part 2]<br />
* [http://www.youtube.com/watch?v=dActaAa-teM Surrealism Part 3]<br />
* [http://www.youtube.com/watch?v=ZLmCh0xw4h0 Surrealism Part 4]<br />
* [http://www.youtube.com/watch?v=s-SsWPNNBC4 Surrealism Part 5]<br />
== York AODA ==<br />
* [http://aoda.yorku.ca/cs/interactive/en/# AODA training]<br />
== Notes for NATS 1500 ==<br />
* Topics<br />
** single-sex schools?<br />
<br />
== Notes for MATH 6627 ==<br />
* [http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225 Collection of misleading graphs]<br />
* [http://www.amstat.org/sections/cnsl/BooksJournals.cfm ASA consulting page]<br />
* set up student home page<br />
* first assignment. Find and explore a dataset using<br />
** Ernest Kwan's correlagram<br />
** Lattice (use panels and groups)<br />
** p3d<br />
** gapminder<br />
** should have included candisc<br />
* present a 15-minute(crucial) presentation on the data set and on the method<br />
* prepare a wiki page with links and materials <br />
* Address a few questions:<br />
** What are strengths and weaknesses<br />
** For what kind of dataset is it well suited and what kind not?<br />
** Can you find a dataset that illustrates well the features of this approach?<br />
** Can you compare your approach with other approaches?<br />
[[/test page|test]]<br />
<br />
* develop checklists:<br />
* initial exploration of data<br />
* missing data (explicit and implicit)<br />
* do simulation of parallel methods: check estimation of variance parameters<br />
* use nlme to estimate knot placement in gsp<br />
=== Links ===<br />
* [http://healthland.time.com/2011/09/02/mind-reading-why-bad-math-can-ruin-your-health/ Why bad math can ruin your health]<br />
* [http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html TED talk: David McCandless, The Beauty of Data Visualization]<br />
* Hans Rosling<br />
* Great intro to Gapminder: http://www.mrbartonmaths.com/gapminder.htm<br />
<br />
* [http://scs.math.yorku.ca/index.php?title=Special:Contributions&dir=prev&contribs=user&target=Georges Contributions]<br />
* [http://www.scribd.com/doc/17378132/The-Fallacy-of-Personal-Validation-a-Classroom-Demonstration-of-Gullibility The Forer Effect - a classroom example]<br />
<br />
== Notes for R course ==<br />
* Start: It had to be U ... on the SVD [http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I]<br />
* Use SPSS dates both ways to illustrate <br />
** sub using regular expressions<br />
** import: reading dates into 'Date' format using formats: Include all %a %b %Y and others?<br />
** export: writing a date into a character string using format( Date.object, "%d-%m-%Y") to create variable SPSS can read<br />
* Variable references<br />
*: deal with plethora of ways used differently in different places: <br />
*:* formula ( ~id ), good for variables in different roles ( y ~ log(x) + x2 | id)<br />
*:* interpreted in data: (id), good for single var but can use list : (list(x1,x2))<br />
*:*:Examples:<br />
*:*:* <br />
*:* fully reference: dd$id <br />
*Beware:<br />
* aggregate with a formula drops rows with NAs even though the FUN might be able to handle them<br />
* multiple barplot: http://rtricks.wordpress.com/2009/10/26/multbar-advanced-multiple-barplot-with-sem/<br />
=== Add ===<br />
* Discussion of memory issues: what happens when you work on two computers<br />
=== Links ===<br />
* [http://www.r-bloggers.com/r-popularity-%E2%80%93-steady-growth-and-new-york-times/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+RBloggers+%28R+bloggers%29 R blog]<br />
* John Fox: ICPSR: 2010: [http://socserv.mcmaster.ca/jfox/Courses/R-course/Slides-handout.pdf Overview including slides on building R packages]<br />
* [http://www.icpsr.umich.edu/files/sumprog/biblio/2010/Fox.pdf Introduction to the R computing environment]<br />
* ICPSR 2011:<br />
** [http://socserv.mcmaster.ca/jfox/Courses/R/ICPSR/index.html Overall page]<br />
**<br />
<br />
== Notes for High School Talks ==<br />
=== Climate change ===<br />
* http://www.cbc.ca/news/technology/story/2011/09/09/pol-climate-adaptation.html<br />
<br />
== Excel techniques ==<br />
* Regular expressions and string substitution<br />
**<br />
* [http://www.contextures.com/xldataval08.html]<br />
<br />
== Ellipse Seminar ==<br />
[[/Ellipse Seminar]]<br />
== Setting up mathstat email in Thunderbird ==<br />
IMAP mailserver: mathstat.yorku.ca Port: 143 Security: STARTTLS<br />
<br />
Outgoing: mathstat.yorku.ca Port: 587 Security: none?<br />
<br />
== Statistical amusement ==<br />
* Statistical Song channel on youtube: http://www.youtube.com/user/StatisticalSongs<br />
** It had to be U ... on the SVD: http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I<br />
** It don't mean a thing if you don't do modelling: http://www.youtube.com/user/StatisticalSongs#p/u/0/Jzm2hrEfNdY<br />
== On Careers in Statistics and Mathematics==<br />
*[https://sites.google.com/site/statsr4us/intro/2-the-joy-of-stats/statsfuture The Joy of Statistics]<br />
*[http://www.bbc.co.uk/news/business-14631547 How Mathematicians Rule the Markets: Quant Trading]<br />
<br />
== On Teaching Science ==<br />
<br />
* [http://www.nats.yorku.ca/index.shtml Natural Science Web Site]<br />
* [https://sites.google.com/site/changingourteaching/Home/tips-for-teaching-asistants Collection of links on teaching]<br />
<br />
<br />
=== A few videos ===<br />
* [http://www.youtube.com/watch?v=ccReLF6M62Y Why Teach Science by James Randi]<br />
* [http://www.youtube.com/watch?v=BlpyGhABXRA Teaching Introductory Physics]<br />
* [http://www.ted.com/talks/brian_goldman_doctors_make_mistakes_can_we_talk_about_that.html?utm_source=newsletter_weekly_2012-01-25&utm_campaign=newsletter_weekly&utm_medium=email Brian Goldman on learning from mistakes]<br />
<br />
== LOG ==<br />
* [[/a]]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2018-2019SCS Reads 2018-20192018-12-09T21:50:59Z<p>Georges: /* November 2 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why<br />
<br />
....<br />
<br />
== January 4 ==<br />
<br />
* We plan to continue a discussion of Chapter 7 of the Book of Why focussing on details of the do-calculus.<br />
* An interesting reference might be this [https://www.ssc.wisc.edu/soc/faculty/pages/docs/elwert/Elwert%202013.pdf 2013 article by Felix Elwert on Graphical Causal Models].</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2018-2019SCS Reads 2018-20192018-10-31T18:56:15Z<p>Georges: /* October 19 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why<br />
<br />
== November 2 ==<br />
<br />
* Chapters 4 and 5 of the Book of Why</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2018-2019SCS Reads 2018-20192018-10-11T21:19:46Z<p>Georges: /* October 5 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]<br />
<br />
== October 19 ==<br />
<br />
* Chapters 2 and 3 of the Book of Why</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2018-2019SCS Reads 2018-20192018-09-28T16:50:41Z<p>Georges: /* October 5 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]<br />
* [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2018-2019SCS Reads 2018-20192018-09-28T16:49:50Z<p>Georges: /* September 21 */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.<br />
<br />
== September 21 ==<br />
<br />
Talk: Michael Friendly, ''100+ Years of Titanic Graphs''<br />
<br />
* Slides: [[File:SCS-TitanicGraphs-2x2.pdf]]<br />
<br />
== October 5 ==<br />
<br />
* Introduction and Chapter 1 of the Book of Why<br />
* [http://bayes.cs.ucla.edu/WHY/ Website for the book]</div>Georgeshttp://scs.math.yorku.ca/index.php/SCS_Reads_2018-2019SCS Reads 2018-20192018-09-19T19:00:51Z<p>Georges: /* Converging to a plan this year */</p>
<hr />
<div>== Links to past episodes of SCS Reads ==<br />
* [[SCS Reads 2017-2018|SCS Reads 2017-2018 Counterfactuals and Causal Inference]]<br />
* [[SCS Reads 2016-2017|SCS Reads 2016-2017 Rethinking Statistics]]<br />
* [[SCS Reads 2015-2016|SCS Reads 2015-2016 Bayesian Statistics in the Social Sciences]]<br />
* [[SCS Reads 2014-2015]]<br />
* [[SCS Reads Nominations|Past and current nominations for SCS Reads]]<br />
* [[SCS Reads -- past seminars]]<br />
== Converging to a plan this year ==<br />
We will adopt a hybrid plan combining reading the Book of Why (first chapter to be discussed on October 5: [http://bayes.cs.ucla.edu/WHY/why-ch1.pdf PDF of first chapter]) with other topics.<br />
<br />
On September 21, Michael Friendly will present a talk on the history of the visualization of the Titanic data.<br />
<br />
== Candidates for 2018-2019 ==<br />
* Pearl and Mackenzie (2018) The Book of Why, in the fall term followed by the remaining chapters (8 to 12) of Morgan and Winship (2015) Counterfactuals and Causal Inference, 2nd ed. in the winter term.<br />
** [https://d2r55xnwy6nx47.cloudfront.net/uploads/2018/05/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515.pdf Kevin Hartnett (2018) To Build Truly Intelligent Machines, Teach Them Cause and Effect, ''Quanta Magazine'']: An interview with Judea Pearl on the Book of Why<br />
** The Book of Why is not technically difficult and provides a broad overview including interesting historical details. We could cover this in one term. It is intended as a trade book so it's cheap and accessible. It's also very useful as a source of ideas for anyone who would like to include more causal ideas in lower level quantitative courses. Here's the [https://www.nytimes.com/2018/06/01/business/dealbook/review-the-book-of-why-examines-the-science-of-cause-and-effect.html review in the New York Times] and a [https://www.amazon.ca/Book-Why-Science-Cause-Effect/dp/046509760X link to Amazon].<br />
** Suggested by Georges<br />
** Pros: Great synthesis including counterfactual and graphical approaches. Discusses related concepts and history. It covers, less formally, many of the ideas in the first portion of Morgan and Winship so it would allow new members to visit this material before reading the last 5 chapters of Morgan and Winship.<br />
** Cons: We've just spent a year on causal models. Would we prefer to do something else?<br />
<br />
== Initial discussion of topics ==<br />
A view was expressed that we should consider this seminar series more broadly, with a view to:<br />
* wider, and more active participation by the seminar members<br />
* topics or book chapters that would:<br />
** enlist a volunteer as discussion leader, or organize the topic<br />
** perhaps involve some concrete, practical examples used to illustrate the topic<br />
<br />
Without prejudice to the choice of The ''Book of Why'' on causal inference, some suggested topics<br />
mentioned are listed below, though the general view was that, if we took this route, only 3-4<br />
should be considered for this year. I'm just listing these, but they deserve to be fleshed out<br />
more for us to consider how & whether they would work.<br />
<br />
* '''Reproducibility of research''': How should statistical practice be informed by current controversies about the ''replication crisis'' and countervailing moves toward open science and reproducibility? This was a big topic at the recent JSM 2018 conference. [MF: I should add some references here.]<br />
** Roger Peng, Reproducible Research in Computational Science, ''Science'', Vol. 334, Issue 6060, pp. 1226-1227 [http://science.sciencemag.org/content/334/6060/1226]<br />
**RC: This is timely in a couple ways. Chris Green is teaching a course on the topic, so he or one of the students might be able to come to a meeting to discuss their views. We also have a QM Forum speaker on topic in the Winter<br />
<br />
* '''Big Data problems''': Another hot topic, but perhaps too broad. [MF: I don't know enough to specify it more clearly as a useful seminar topic.] [This would be interesting, but I would hope we could find materials that address the issue from a social science perspective]<br />
<br />
* Some statistical methods topics, not yet clearly articulated:<br />
** Clustering methods [RC: Could this fall under the Big Data label?]<br />
** Robustness<br />
** Mediation<br />
** Meta analysis in medicine: how can you tell whether a lit review is complete with logistic regression?! [RC: I think Meta-Analysis could be a good topic, including M-L/J-A discussing their research]<br />
<br />
* '''Consulting issues''': Practical aspects of statistical consulting [MF: Perhaps this would be a better topic for the SCS staff meetings ??] [RC: That would make a good topic for a business/staff meeting, if there were no consulting cases to discuss]<br />
<br />
* '''Survey Sampling''': Elucidating the mystery of bootstrap weights and how to use them when analyzing survey data, e.g. from Statistics Canada. <br />
<br />
* '''Evidence-based medicine''': Ideas and implications<br />
<br />
* '''Machine Learning, AI, Deep Learning''': An Overview <br />
<br />
* '''Disseminating technical information to non-technical audiences''': in consulting and in teaching. [Could this be put together with 'consulting issues'?]<br />
<br />
* '''Missing Data'''<br />
<br />
* '''Statistical Paradoxes and Fallacies''' [RC: There are some good "summary" articles related to statistical paradoxes and Georges' examples are always helpful]<br />
<br />
** Initial attempt at a creating a list: [[Paradoxes, Fallacies and Other Surprises]]. Please add, modify or comment.</div>Georgeshttp://scs.math.yorku.ca/index.php/Paradoxes,_Fallacies_and_Other_SurprisesParadoxes, Fallacies and Other Surprises2018-09-19T01:25:18Z<p>Georges: /* Birth-Weight Paradox */</p>
<hr />
<div>An attempt at a taxonomy of statistical paradoxes, fallacies and other surprises.<br />
<br />
== Simpson's Paradox ==<br />
Classical examples:<br />
* '''Florida capital sentencing''' of convicted murderers: Suppression effect of a confounder: No marginal relationship between race of accused and rate of capital sentencing but the relationship becomes very strong when controlling for a confounding factor: the race of victim.<br />
* '''Berkeley graduate admissions''': Overall lower rate of acceptance for female candidates but no or little gender effect within departments. Women tend to apply to departments that are harder to get into (have low admission rates) for both men and women. By controlling for departments the appearance of gender discrimination disappears. But is this the right analysis? That depends on the mechanism through which discrimination occurs. If university budgeting decision systematically favour departments that teach topics that are more appealing to men, then ''departments'' is a mediator and controlling for departments masks the effect of discrimination. We need different models to identify 'micro-discrimination' at the departmental level and 'macro-discrimination' at the university level. This is analogous to asking whether ''department'' should be treated as a confounder (and included in the model) or as a mediator (and excluded) in order to identify a causal effect of gender. Conditioning is not automatically the right thing to do. So the right model to estimate discrimination depends on the mechanism of discrimination. For mechanisms in which department is a mediator or a collider, it must be omitted. For mechanisms in which it is a confounder, it must be included.<br />
<br />
== Birth-Weight Paradox ==<br />
Example of conditioning, or selection, on a collider variable.<br />
<br />
== Regression Paradox ==<br />
Discrepancy between global and local relationships. Some classical examples:<br />
* Globally the distribution of heights can remain the same from generation although tall parents get the impression that their children are shorter than themselves and short parents get the impression that their children are taller than themselves on average. Also, tall children get the impression that their parents are shorter than themselves and short children get the impression that their parents are taller than themselves on average. Thus parents get the impression, perfectly legitimately from their point of view, that the distribution of heights is being compressed towards the mean '''and''' children get the impression, also perfectly legitimately, that their parents' heights were more compressed towards the mean.<br />
* Kahneman's pilot instructors got the impression that criticism improved performance of student pilots while praise made it worse. Kahneman thought the causal effect should be in the opposite direction. Regression to the mean allows one to see how Kahneman's belief and the pilot instructors' impression are not inconsistent. The resolution of the paradox lies partly in realizing that the the instructors are noticing an 'observational' relationship that is in the opposite direction to the possible causal relationship. Thus there's a connection with Simpson's Paradox.<br />
<br />
== Lord's Paradox ==<br />
When comparing two groups using a pretest and a posttest, should we compare gain scores between groups or should we regress the posttest on groups using the pretest as a covariate?<br />
<br />
== Base rate paradoxes ==<br />
'''Prosecutor's Fallacy''': The p-value to test the hypothesis that Sally Clark was innocent of the murder of her two children could legitimately be interpreted as being in the vicinity of 1/100,000. However there's also a legitimate argument that the probability of her innocence is very close to 1. These two results seemingly contradictory results are not inconsistent with each other.<br />
<br />
'''Representativeness heuristic''': This is a concept formulated by Tversky and Kahneman. One could view<br />
the heuristic as amounting to forming judgements based on relative likelihood, thus ignoring the base rate or 'prior', the foundational fallacy of frequentism. <br />
<br />
'''Stereotyping''': [https://www.sciencedirect.com/science/article/pii/0022103182900798 Social stereotypes and judgments of individuals: An instance of the base-rate fallacy]<br />
<br />
== Prisoner's Paradox==<br />
Also known as the Monty Hall problem or the Principle of Restricted Choice in bridge. This is a very revealing paradox that illustrates the importance of taking into account the probabilistic mechanism that generated information, in addition to the information itself, if the information does not induce a partition of the space of possibilities.<br />
<br />
It illustrates the crucial role of statistical modelling, but perhaps not in a way that is supportive of frequentist inference.<br />
<br />
== Weighting Paradoxes ==<br />
Students at a university report an average class size of 100. Professors report an average class size of 50. Are students likely to be exaggerating and professors underestimating the size of their classes?<br />
<br />
== Paradoxes of measures of central tendency ==<br />
A random sample of 100 taxpayers reveals an average income of $30,000 although the government knows that the average income is $60,000. Is the sample likely to be biased and/or respondents understating their income? Or is there another plausible explanation?<br />
<br />
== Inference Paradoxes ==<br />
It is possible to build a model which the parameter space and the support are both equal to the natural numbers and in which a conference procedure that has at least 2/3 probability of coverage for all <math>\theta</math> (i.e. 2/3 confidence) has only 1/3 posterior probability for all <math>y</math> under a uniform prior for <math>\theta</math>. Thus confidence and credibility can be strongly inconsistent in contrast with the intuition based on compact models in which mean credibility must equal mean confidence.<br />
<br />
<!-- Conditioning paradoxes --></div>Georgeshttp://scs.math.yorku.ca/index.php/Paradoxes,_Fallacies_and_Other_SurprisesParadoxes, Fallacies and Other Surprises2018-09-19T01:22:01Z<p>Georges: /* Simpson's Paradox */</p>
<hr />
<div>An attempt at a taxonomy of statistical paradoxes, fallacies and other surprises.<br />
<br />
== Simpson's Paradox ==<br />
Classical examples:<br />
* '''Florida capital sentencing''' of convicted murderers: Suppression effect of a confounder: No marginal relationship between race of accused and rate of capital sentencing but the relationship becomes very strong when controlling for a confounding factor: the race of victim.<br />
* '''Berkeley graduate admissions''': Overall lower rate of acceptance for female candidates but no or little gender effect within departments. Women tend to apply to departments that are harder to get into (have low admission rates) for both men and women. By controlling for departments the appearance of gender discrimination disappears. But is this the right analysis? That depends on the mechanism through which discrimination occurs. If university budgeting decision systematically favour departments that teach topics that are more appealing to men, then ''departments'' is a mediator and controlling for departments masks the effect of discrimination. We need different models to identify 'micro-discrimination' at the departmental level and 'macro-discrimination' at the university level. This is analogous to asking whether ''department'' should be treated as a confounder (and included in the model) or as a mediator (and excluded) in order to identify a causal effect of gender. Conditioning is not automatically the right thing to do. So the right model to estimate discrimination depends on the mechanism of discrimination. For mechanisms in which department is a mediator or a collider, it must be omitted. For mechanisms in which it is a confounder, it must be included.<br />
<br />
== Birth-Weight Paradox ==<br />
Example of conditioning or selection on a collider variable.<br />
<br />
== Regression Paradox ==<br />
Discrepancy between global and local relationships. Some classical examples:<br />
* Globally the distribution of heights can remain the same from generation although tall parents get the impression that their children are shorter than themselves and short parents get the impression that their children are taller than themselves on average. Also, tall children get the impression that their parents are shorter than themselves and short children get the impression that their parents are taller than themselves on average. Thus parents get the impression, perfectly legitimately from their point of view, that the distribution of heights is being compressed towards the mean '''and''' children get the impression, also perfectly legitimately, that their parents' heights were more compressed towards the mean.<br />
* Kahneman's pilot instructors got the impression that criticism improved performance of student pilots while praise made it worse. Kahneman thought the causal effect should be in the opposite direction. Regression to the mean allows one to see how Kahneman's belief and the pilot instructors' impression are not inconsistent. The resolution of the paradox lies partly in realizing that the the instructors are noticing an 'observational' relationship that is in the opposite direction to the possible causal relationship. Thus there's a connection with Simpson's Paradox.<br />
<br />
== Lord's Paradox ==<br />
When comparing two groups using a pretest and a posttest, should we compare gain scores between groups or should we regress the posttest on groups using the pretest as a covariate?<br />
<br />
== Base rate paradoxes ==<br />
'''Prosecutor's Fallacy''': The p-value to test the hypothesis that Sally Clark was innocent of the murder of her two children could legitimately be interpreted as being in the vicinity of 1/100,000. However there's also a legitimate argument that the probability of her innocence is very close to 1. These two results seemingly contradictory results are not inconsistent with each other.<br />
<br />
'''Representativeness heuristic''': This is a concept formulated by Tversky and Kahneman. One could view<br />
the heuristic as amounting to forming judgements based on relative likelihood, thus ignoring the base rate or 'prior', the foundational fallacy of frequentism. <br />
<br />
'''Stereotyping''': [https://www.sciencedirect.com/science/article/pii/0022103182900798 Social stereotypes and judgments of individuals: An instance of the base-rate fallacy]<br />
<br />
== Prisoner's Paradox==<br />
Also known as the Monty Hall problem or the Principle of Restricted Choice in bridge. This is a very revealing paradox that illustrates the importance of taking into account the probabilistic mechanism that generated information, in addition to the information itself, if the information does not induce a partition of the space of possibilities.<br />
<br />
It illustrates the crucial role of statistical modelling, but perhaps not in a way that is supportive of frequentist inference.<br />
<br />
== Weighting Paradoxes ==<br />
Students at a university report an average class size of 100. Professors report an average class size of 50. Are students likely to be exaggerating and professors underestimating the size of their classes?<br />
<br />
== Paradoxes of measures of central tendency ==<br />
A random sample of 100 taxpayers reveals an average income of $30,000 although the government knows that the average income is $60,000. Is the sample likely to be biased and/or respondents understating their income? Or is there another plausible explanation?<br />
<br />
== Inference Paradoxes ==<br />
It is possible to build a model which the parameter space and the support are both equal to the natural numbers and in which a conference procedure that has at least 2/3 probability of coverage for all <math>\theta</math> (i.e. 2/3 confidence) has only 1/3 posterior probability for all <math>y</math> under a uniform prior for <math>\theta</math>. Thus confidence and credibility can be strongly inconsistent in contrast with the intuition based on compact models in which mean credibility must equal mean confidence.<br />
<br />
<!-- Conditioning paradoxes --></div>Georgeshttp://scs.math.yorku.ca/index.php/Paradoxes,_Fallacies_and_Other_SurprisesParadoxes, Fallacies and Other Surprises2018-09-19T01:18:56Z<p>Georges: /* Lord's Paradox */</p>
<hr />
<div>An attempt at a taxonomy of statistical paradoxes, fallacies and other surprises.<br />
<br />
== Simpson's Paradox ==<br />
Classical examples:<br />
* '''Florida capital sentencing''' of convicted murderers: Suppression effect of a confounder: No marginal relationship between race of accused and rate of capital sentencing but the relationship becomes very strong when controlling for a confounding factor: the race of victim.<br />
* '''Berkeley graduate admissions''': Overall lower rate of acceptance for female candidates but no or little gender effect within departments. Women tend to apply to departments that are harder to get into (have low admission rates) for both men and women. By controlling for departments the appearance of gender discrimination disappears. But is this the right analysis? That depends on the mechanism through which discrimination occurs. If university budgeting decision systematically favour departments that teach topics that are more appealing to men, then ''departments'' is a mediator and controlling for departments masks the effect of discrimination. We need different models to identify 'micro-discrimination' at the departmental level and 'macro-discrimination' at the university level. This is analogous to asking whether ''department'' should be treated as a confounder (and included in the model) or as a mediator (and excluded) in order to identify a causal effect of gender. Conditioning is not automatically the right thing to do.<br />
<br />
== Birth-Weight Paradox ==<br />
Example of conditioning or selection on a collider variable.<br />
<br />
== Regression Paradox ==<br />
Discrepancy between global and local relationships. Some classical examples:<br />
* Globally the distribution of heights can remain the same from generation although tall parents get the impression that their children are shorter than themselves and short parents get the impression that their children are taller than themselves on average. Also, tall children get the impression that their parents are shorter than themselves and short children get the impression that their parents are taller than themselves on average. Thus parents get the impression, perfectly legitimately from their point of view, that the distribution of heights is being compressed towards the mean '''and''' children get the impression, also perfectly legitimately, that their parents' heights were more compressed towards the mean.<br />
* Kahneman's pilot instructors got the impression that criticism improved performance of student pilots while praise made it worse. Kahneman thought the causal effect should be in the opposite direction. Regression to the mean allows one to see how Kahneman's belief and the pilot instructors' impression are not inconsistent. The resolution of the paradox lies partly in realizing that the the instructors are noticing an 'observational' relationship that is in the opposite direction to the possible causal relationship. Thus there's a connection with Simpson's Paradox.<br />
<br />
== Lord's Paradox ==<br />
When comparing two groups using a pretest and a posttest, should we compare gain scores between groups or should we regress the posttest on groups using the pretest as a covariate?<br />
<br />
== Base rate paradoxes ==<br />
'''Prosecutor's Fallacy''': The p-value to test the hypothesis that Sally Clark was innocent of the murder of her two children could legitimately be interpreted as being in the vicinity of 1/100,000. However there's also a legitimate argument that the probability of her innocence is very close to 1. These two results seemingly contradictory results are not inconsistent with each other.<br />
<br />
'''Representativeness heuristic''': This is a concept formulated by Tversky and Kahneman. One could view<br />
the heuristic as amounting to forming judgements based on relative likelihood, thus ignoring the base rate or 'prior', the foundational fallacy of frequentism. <br />
<br />
'''Stereotyping''': [https://www.sciencedirect.com/science/article/pii/0022103182900798 Social stereotypes and judgments of individuals: An instance of the base-rate fallacy]<br />
<br />
== Prisoner's Paradox==<br />
Also known as the Monty Hall problem or the Principle of Restricted Choice in bridge. This is a very revealing paradox that illustrates the importance of taking into account the probabilistic mechanism that generated information, in addition to the information itself, if the information does not induce a partition of the space of possibilities.<br />
<br />
It illustrates the crucial role of statistical modelling, but perhaps not in a way that is supportive of frequentist inference.<br />
<br />
== Weighting Paradoxes ==<br />
Students at a university report an average class size of 100. Professors report an average class size of 50. Are students likely to be exaggerating and professors underestimating the size of their classes?<br />
<br />
== Paradoxes of measures of central tendency ==<br />
A random sample of 100 taxpayers reveals an average income of $30,000 although the government knows that the average income is $60,000. Is the sample likely to be biased and/or respondents understating their income? Or is there another plausible explanation?<br />
<br />
== Inference Paradoxes ==<br />
It is possible to build a model which the parameter space and the support are both equal to the natural numbers and in which a conference procedure that has at least 2/3 probability of coverage for all <math>\theta</math> (i.e. 2/3 confidence) has only 1/3 posterior probability for all <math>y</math> under a uniform prior for <math>\theta</math>. Thus confidence and credibility can be strongly inconsistent in contrast with the intuition based on compact models in which mean credibility must equal mean confidence.<br />
<br />
<!-- Conditioning paradoxes --></div>Georges