MATH 6627 201011 Practicum in Statistical Consulting
From Wiki1
Current year: MATH_6627_201112_Practicum_in_Statistical_Consulting
Practicum in Statistical Consulting
 News
 The 2hour inclass final exam will take place from 2pm to 4pm on Friday, April 15, 2011, in N627 Ross. Sample questions will be posted on Monday the 11th.
 March 8: Here is the team assignment due March 15
 Jan 20: Our class photo
 Jan 16: I have added a table with links to files that have been uploaded for the course: Links to files for MATH 6627
 Quick links
 Student pages
 Assignment Teams
 Consulting Project Teams
 Links and References
 Statistics: Questions and Answers (including statistical issues that arise in consulting)
 Consulting: Questions and Answers (focusing more on the interpersonal, ethical, organizational and psychological aspects of consulting)
 R: Questions and Answers
 Miscellaneous Questions and Answers
 Links to files for MATH 6627 Repository of files for MATH 6627 in 201011 (userid: fisher password: cohen)
 Home page and repository of files for MATH 6627 in 200809 (userid: fisher password: cohen)
 Public wiki: http://wiki.math.yorku.ca/
 Private wiki: http://statswiki.math.yorku.ca/
 Getting started with R: http://wiki.math.yorku.ca/index.php/R:_Getting_started
 Statistical Consulting Service: http://www.yorku.ca/isr/scs/
 /Classlist
Number  Given Name  Family Name  email address  wiki userid and course page  assignment team page 

1  Yurong (Crystal)  Cao  yrcao@yorku.ca  yrcao  Diaconis 
2  Jia Yi (Jessica)  Li  weiwei8588@hotmail.com  jyjli  Rubin 
3  Tianyu (Andy)  Li  andytli@yahoo.ca  andytli  Gray 
4  Constance  Mara  cmara@yorku.ca  cmara  Diaconis 
5  Luis  Palma  eropall@msn.com  luispal  Rubin 
6  Gurpreet (Preety)  Saini  gurksn@yorku.ca  gurksn  Gray 
7  Carrie  Smith  smithce@yorku.ca  smithce  Gray 
8  Bin  Sun  bsun1010@gmail.com  bsun  Diaconis 
9  Laura  Warren  lawarren@yorku.ca  lawarren  Rubin 
10  Yufeng (Sky)  Lin  skylin44@yorku.ca  skylin44  Auditor 
11  Heather  Krause  heather@datassist.ca  hkrause  Auditor 
12  Hang Ling (Annie)  Wang  hangjing@mathstat.yorku.ca  hangling  Diaconis 
Contents 
General Information
Instructor
 Georges Monette, Ph.D., P.Stat. (http://www.ssc.ca/accreditation/index_e.html)
 N626 Ross
 mail to georges monette (Note adding '+practicum' increases the priority of your message in my mailbox)
 http://www.math.yorku.ca/~georges
 Office hours: Thursdays 4 pm to 6 pm and other times by appointment
Meetings
The class will meet every week on Wednesdays from 7:00 pm to 10:00 pm in ACE (Accolade East) Room 010.
Goals
As undergraduates we learn statistics through a sequence of courses each focusing on some part of statistical theory. When we solve problems in these courses the set of tools we are expected to use is fairly obvious. When you have to solve realworld statistical problems, it is rare that there are clear clues about the appropriate theories or methods that need to be used.
In fact, many problems are best handled with eclectic solutions borrowing from many areas of knowledge. Not only do you need to draw on your statistical knowledge but also on all your accumulated knowledge and experience in life: your understanding of the subject matter of the problem, your creativity with mathematical models, your ability to visualize and communicate, your interpersonal skills in understanding your clients' possible anxieties with statistics, your insight in working with your own anxieties, etc, etc.
The goal of this course is to help you develop the skills and confidence to solve realworld problems. You will learn about the key role of many statistical concepts that are rarely seen in detail in standard courses. You will also learn the vital role of visualization and graphics, communication (listening even more than talking) and presentation skills.
The course will help you develop skills in a number of areas:
 programming and data management skills in R: Although the emphasis in this course is entirely on R, many jobs expect a strong knowledge of SAS  take every opportunity you can to also learn SAS. Consider, if you are a beginner, the courses offered through the Statistical Consulting Service. If you work with clients who use another package, e.g. SPSS or Stata, you might have to learn enough about them to show your clients how to perform their analyses using their own packages. A recent feature in SPSS and SAS allows R to be called from these packages. If you have a solution that is too advanced for SPSS or SAS, you can prepare code in R and provide a client with the ability to perform the analysis from the package that is familiar to them.
 graphics to visualize data and models
 how to work as a statistical consultant/collaborator in the analysis of scientific problems
 developing presentations skills
 developing an understanding of the role of statistics as a discipline and as a profession in science and business
 acquiring basic concepts and techniques related to the analysis of hierarchical and longitudinal data  a large proportion of the problems our students deal with in consulting involve problems that can be approached through these techniques
 understanding ethical issues related to statistical practice
References
There is no single textbook this year. Consult and add to our list of useful links and references: Links and References
Course Work
There are 4 components:
 Individual student pages (or 'blogs') and contributions to the course wiki:
 Use your course page, which you create as a subpage of the Student pages for the course, to prepare after each class and before the next class:
 Sample exam questions: A sample exam question and answer on the material of the previous class.
 A posting with links and comments on
 Statistics in the News
 OR
 Statistical paradoxes and fallacies
 OR
 (optional) Reflections on statistical consultations that you arrange to attend in the SCS. The reflections must be constructed to avoid any violation of privacy of either the client or the consultant whose session you attended.
 Statistics in the News
 Questions and comments on group work and class lectures
 In addition, throughout the course, contribute questions, answers, comments, reflections, etc. to the various discussion pages for the course, e.g. Statistics: Questions and Answers, Statistics: Questions and Answers, etc. In fact, you can create new discussion pages.
 See more details.
 Groups assignments and short presentations
 Four or five group assignments on material covered in the course. Some assignments will also involve short group presentations to the class. These presentations will be timed (typically 15 minutes) and getting the interesting aspects of your message across in a limited time is challenging and requires good preparation and coordination.
 The groups for assignments are created using a quasirandom algorithm. Their composition will be sent to you shortly after the first class.
 Group consulting project
 You will work on a major consulting project in which you will collaborate with a real client to produce a deep and probing consulting report and analysis. The project is very likely to involved multilevel models. Students who are interested may opt to work on one of two case studies for possible presentation at conferences in the summer of 2011. The case study team may include students who are not currently enrolled in MATH 6627. Further details will be available soon. The consulting project groups are not necessarily identical to the assignment group.
 A final inclass 2hour exam on the material of the course. If you produce good sample questions, you might find a question very close to yours on the exam.
The weight of components is 30% for individual pages and contributions to the wiki, 30% for group assignments, 30% for the consulting project and 10% for the final exam. Note that your contributions to the wiki in general are as easy to identify as those in your individual course page and are given credit that is similar to contribution on your own course page or, indeed, comments added to other students' course pages.
Course organization
 In the first few classes we will look at general statistical questions that are important in consulting. The following five or six weeks will be devoted to multilevel models and the final portion of the course will be devoted to discussions and presentations of your projects.
 You will have the opportunity to organize and present short presentation on assignments starting from the third week of classes.
 Starting from the first week, you should work on your course pages, generating questions, links, comments, etc., and you should start working with your Assignment Team on the first assignment.
 The course assumes a working knowledge of R including trellis (lattice) graphics. We can schedule a few special tutorials for anyone who feels the need.
 The final exam will give you a chance to show how you have reflected on the material of the course. It will consist mainly of 'essay' questions possibly drawn for your own sample questions if they turn out to be good.
Class list and teams
Number  Given Name  Family Name  email address  wiki userid and course page  assignment team page 

1  Yurong (Crystal)  Cao  yrcao@yorku.ca  yrcao  Diaconis 
2  Jia Yi (Jessica)  Li  weiwei8588@hotmail.com  jyjli  Rubin 
3  Tianyu (Andy)  Li  andytli@yahoo.ca  andytli  Gray 
4  Constance  Mara  cmara@yorku.ca  cmara  Diaconis 
5  Luis  Palma  eropall@msn.com  luispal  Rubin 
6  Gurpreet (Preety)  Saini  gurksn@yorku.ca  gurksn  Gray 
7  Carrie  Smith  smithce@yorku.ca  smithce  Gray 
8  Bin  Sun  bsun1010@gmail.com  bsun  Diaconis 
9  Laura  Warren  lawarren@yorku.ca  lawarren  Rubin 
10  Yufeng (Sky)  Lin  skylin44@yorku.ca  skylin44  Auditor 
11  Heather  Krause  heather@datassist.ca  hkrause  Auditor 
12  Hang Ling (Annie)  Wang  hangjing@mathstat.yorku.ca  hangling  Diaconis 
Week 1: January 5, 2011
 Topics

 Course organization
 Participation in SCS seminars
 You are welcome to attend SCS (Statistical Consulting Service) weekly meetings which consist of biweekly 'staff meetings' and biweekly seminars on a statistical topic of interest to statistical consultants. The year we are reading and discussion the book Lance, C.E., & Vandenberg, R.J. (Eds.) (2009). Statistical and methodological myths and urban legends: Doctrine, verity and fable in the organizational and social sciences. New York, NY: Routledge. Meetings take place every Friday at 1 pm in TEL 5082. Please send an email message to Georges Monette to have your name added to the SCS mailing list. Note that SCS also offers short courses some of which might be of interest to you.
 Consulting, communication, writing reports

 Statistical consulting environment
 Writing reports: Secret of good writing: write so your reader understands you!
 Notes on writing reports
 Seven basic principles
 Not all consulting activities require a formal report. Often a phone call, a verbal report in a face to face meeting, a letter or a memo are the most efficient way of communicating to a client
 Communication:
 Interpersonal aspects of statistical consulting: Janice Derr, Statistical Consulting Video
 Contributions by Doug Zahn
 The role of statistics in society  understanding evidence

 One of of the greatest challenges in understanding evidence is bridging the gap between observational data and causal inference, i.e. understanding the links between statistical significance and statistical meaning.
 Statistics in the news: link to come
 Smoking: Observational vs. Experimental data: link to come
Types of Data  

Experimental  Observational  
Types of Inference  Causal  Where Fisher wants to be  Where we often are 
Predictive  Problematic but very rare  Good for 'prediction' not 'causal inference': This is the topic of Frank Harrell's Regression Modeling Strategies' 
 Finding meaning in observational data  examples
 Hans Rosling: Myths about the developing world
 Al Gore: An Inconvenient Truth
 Peter Donnelly: How juries get fooled by statistics
 Statistician Peter Donnelly explores the common mistakes humans make in interpreting statistics, and the devastating impact these errors can have on the outcome of criminal trials.
 Piet Groeneboom Lucia de Berk and the amateur statisticians
 Andrey Feuerverger: The Lost Tomb of Jesus
 Finding meaning in observational data  examples
 Software

 A working statistician should be proficient with at least SAS and R. In addition, you may need to know the package(s) typically used by your clients, e.g. SPSS. This course uses R because just about everything is possible (not necessarily easy) in R. Increasingly, new methodologies are first prototyped as R packages. As a statistician, knowing R may give you your best edge even if in your initial contacts with clients or in your first jobs it doesn't necessarily seem so relevant.
 Getting started with R
 After installing the current version of R, you should install the packages we are likely to use a lot:
> install.packages("car") # do this first to get the current version > install.packages("spida", repos = "http://rforge.rproject.org") > install.packages("p3d", repos = "http://rforge.rproject.org")
 It is a good idea to use separate project directories for different projects. See Using R with project directories under Windows.
 To begin learning R work through Maindonald (2008) Using R for Data Analysis and Graphics
 Another excellent tutorial is Christopher Green: R Primer
 When you're ready to really plunge into R, work your way through the manual that comes with R. From the R window click on HelpManualsAn Introduction to R.
 Explore graphics in R with a sample script.
 Wikis
 Much of the work on the course will require using a wiki. Editing material in a wiki is very intuitive once you get the hang of it. If you have problems or questions, you can post them on Miscellaneous Questions and Answers.
 Using a wiki for group assignments: see Editing hints for course assignments
Assignment 0 (individual) and things to do
 1. Personal wiki page
 Deadline Friday noon on January 7, 2011 Set up your personal wiki page by adding a link to /Students, and fill in some information about yourself, e.g. where you did your previous studies, your current program, your academic interests, your goals, the software you know how to use, etc.
 2. R

 Install R on your computer(s), including your laptop if you have one. See R: Getting started
 See the discussion above on installing additional packages and getting started.
 3. Weekly 'blog'
 Deadline high noon on Wednesday, January 12, 2011. Enter the weekly material in your personal wiki page.
Assignment 1 (team)
Deadline for files on wiki: noon, January 19, 2011; Presentations: 7 pm on January 19, 2011
 1. Simpson's Paradox
 Simpson's Paradox describes a situation where the direction of the association between two variables, X and Y, changes when conditioning on a third variable, Z. A troubling classical example, based on data obtained from the State of Florida, is discussed in Agresti (1990) "Categorical Data Analysis". In this example, Y is a dichotomous variable indicating whether a prisoner found guilty of murder was sentenced to capital punishment (execution), X is the race (white or black) of the prisoner and Z is the race of the victim. Overall, a lower proportion of blacks than whites are sentenced to be executed but, controlling for the race of the victim, the association is dramatically reversed. Possibly, the judicial process goes easier on blacks because their victims tend to be black. In this example of Simpson's Paradox, X, Y and Z are all dichotomous variables and Z can be thought of as a confounding factor. Although, in principle, the phenomenon is the same whether X, Y or Z are dichotomous or numerical (continuous or discrete) variables, our ability to visualize the phenomenon and to transfer the concept to other situations seems to depend crucially on the nature of the variables. Considering that the role of Z can be that of a confounding factor, a mediating factor or some mixture of both, there are 2 x 2 x 2 x 3 = 24 possible combinations of potential examples illustrating Simpson's Paradox. Find two different ones. They may be just plausible examples or, better, real examples or, best, real examples with real data. Discuss each example and post at least one graph or handdrawn sketch (use something like Microsoft Paint, save as a .png file and upload to the wiki) for each to help visualize the example. Prepare a 5minute presentation.
 2. Graphics to visualize data
 Explore the graphical possibilities of one of the following packages in R:
 I will assign which package initially goes to what group to make sure they are all covered. You are free to swap between groups, however.
 Prepare some files on the wiki that would be useful to help someone else understand how to use and exploit the package and prepare a 10minute presentation.
Links
Week 2: January 12, 2011
 Topics
 Interpreting Relationships Between Variables (review of Week 1)
 Visualizing Regression I  Simple Regression
 Visualizing Regression II  Multiple Regression R script Slides
Assignment 2 (team)
Deadline for files on wiki: noon, January 26, 2011; Presentations: (10 minutes max on both questions) 7 pm on January 26, 2011
 1: Pick one of the articles published in the Pulse column in the Toronto Star. Discuss whether the article suggest a causal relationship between two variables. If so which? Are the data observational or experimental? Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? Have any confounding factors been accounted for in the analysis? Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? What is your personal assessment of the evidence for causality in the study that is the subject of the article?
 Put the title of the article you have chosen on your team page as soon as you choose it so other teams can avoid choosing the same article.
 Post some notes on your discussion of the article in a subpage of your team page.
 Prepare a maximum 5minute presentation preferably using materials available on the web linked through the subpage of your team page.
 2: Prepare a discussion on 5 of the questions in /Paradoxes and Fallacies  I. Do the questions whose number is equal to T (mod 3) where T the is ordering of your team: 1: Diaconis, 2: Gray, 3: Rubin.
 You can work on your discussions in a subpage of your team page and then post them to /Paradoxes and Fallacies  I once you are ready. When you edit /Paradoxes and Fallacies  I, just edit the question you are working on in order to avoid editing collisions if the page is being edited by two people at the same time.
Week 3: January 19, 2011
 Topics
 Presentations
 Continuation of week 2: visualizing multiple regression
 See links for week 3
Links
Links
Organizations
 Statistical Society of Canada:
 Statistics Canada
 United Nations Statistical Commission http://unstats.un.org/unsd/default.htm
Consulting
 D.R. Cox on statistical consulting: http://www.ssc.ca/resources/consultants/cox_e.html
 ASA Consulting Section: http://www.amstat.org/sections/cnsl/index.html
 UBC web page for its consulting course http://www.stat.ubc.ca/Courses/Details/course.php?course=65
 Statistical Consulting at Acadia http://ace.acadiau.ca/math/m4233/m4233.htm
Ethics in Statistics and Consulting
 http://www.westga.edu/~bquest/2001/consultant.htm  Gurpreet
 Andrew Gelman on The ethics of consulting for the tobacco industry: http://www.stat.columbia.edu/~cook/movabletype/archives/2005/10/the_ethics_of_c.html  Crystal
 How the "Urine Toxic Metals" Test Is Used to Defraud Patients:http://www.quackwatch.org/01QuackeryRelatedTopics/Tests/urine_toxic.html Bin Sun
 Sonya K. Sterba (2006) "Misconduct in the Analysis and Reporting of Data: Bridging Methodological and Ethical Agendas for Change", Ethics & Behavior, Volume 16, Issue 4, 2006, Pages 305  318
TED Talks on Statistics
 Peter Donnelly on genomes: http://www.youtube.com/watch?v=kLmzxmRcUTo&mode=related&search=
 Hans Rosling (2006) Debunking thirdworld myths with the best stats you've ever seen http://www.ted.com/talks/view/id/92
 Hans Rosling (2007) on New insights on poverty and life around the world: http://www.ted.com/index.php/talks/view/id/140
Other
 Rod Little on English style in scientific papers: http://sitemaker.umich.edu/rlittle/files/styletips.pdf
 Writing Reports: Seven Basic Principles
 Audio and slides of Workshop on Current Issues in the Analysis of Incomplete Longitudinal Data (October 1315, 2005)at the Fields Institute: http://www.fields.utoronto.ca/audio/#CMM
 Steve's Attempt to Teach Statistics A very interesting site.
 Hugh Chipman's introduction to R http://ace.acadiau.ca/math/scc/workshops_2005/Rclass.html
 Constructionism and Reductionism: Two Approaches to ProblemSolving and Their Implications for Reform of Statistics and Mathematics Curricula http://www.amstat.org/publications/jse/secure/v7n2/lazaridis.cfm
 A Medical Mystery Unfolds in Minnesota, New York Times, Feb. 5, 2008.
 An article on the Grange inquiry: TORONTO INFANT DEATH STIRS CONCERN By Douglas Martin,
New York Times, April 8, 1984
 Gerard E. Dallal, "The Little Handbook of Statistical Practice"
 Links to Statistics Notes in the British Medical Journal
 Andrew Gelman's demonstration of multilevel modeling in an applied setting http://www.youtube.com/watch?v=5JYiJwDob1w
 H. Dean Johnson and Dennis A. Warner(2004 Factors Relating to the Degree to Which Statistical Consulting Clients Deem Their Consulting Experience to be a success, The American Statistician, Vol. 58, No. 4 (Nov., 2004), pp. 280289