MATH 6627 2010-11 Practicum in Statistical Consulting

From Wiki1

Jump to: navigation, search

Current year: MATH_6627_2011-12_Practicum_in_Statistical_Consulting

Practicum in Statistical Consulting

Quick links 

Files for MATH 6627
Week Slides R scripts Notes

Basic Concepts ... with notes

Graphics in R.R


Interpreting Relationships (cont. of week 1)
Visualizing Simple Regression
Visualizing Multiple Regression

Visualizing Multiple Regression.R

3 Visualizing Multiple Regression (with notes added on week 3)

Class photo
Screen capture for week 3

Visualizing Multiple Regression.R (with possible modifications made on week 3)
4 Visualizing Multiple Regression (with notes added on week 4)
5 Visualizing Multiple Regression (with notes added on week 5)

Paradoxes and Fallacies
Hierarchical Models (with notes added on week 5) Lab 1

Lab 1 R script (Mixed Models with R)
6-8 Hierarchical Models (with notes added after week 5)

Thoughts on first meeting a client
Notes on confidence ellipses and Scheffe intervals (see Jessica's note)
Longitudinal Data Analysis with notes in week 7

Lab 2 R script (Longitudinal Models with R)
Lab 3 R script (More on Longitudinal Models and Other Topics)

Team Assignment due March 15

9-11 Rough notes on Exploring a Model

Exploring a Model.R

12-13 Consulting question

Non-linear Models Part I
Non-linear Models Part I - annotated
Non-linear Models Part II
Non-linear Models Part II - annotated
Practical overview of GLMMs

Class list
Number Given Name Family Name email address wiki userid and
course page
assignment team page
1 Yurong (Crystal) Cao yrcao Diaconis
2 Jia Yi (Jessica) Li jyjli Rubin
3 Tianyu (Andy) Li andytli Gray
4 Constance Mara cmara Diaconis
5 Luis Palma luispal Rubin
6 Gurpreet (Preety) Saini gurksn Gray
7 Carrie Smith smithce Gray
8 Bin Sun bsun Diaconis
9 Laura Warren lawarren Rubin
10 Yufeng (Sky) Lin skylin44 Auditor
11 Heather Krause hkrause Auditor
12 Hang Ling (Annie) Wang hangling Diaconis


General Information



The class will meet every week on Wednesdays from 7:00 pm to 10:00 pm in ACE (Accolade East) Room 010.


As undergraduates we learn statistics through a sequence of courses each focusing on some part of statistical theory. When we solve problems in these courses the set of tools we are expected to use is fairly obvious. When you have to solve real-world statistical problems, it is rare that there are clear clues about the appropriate theories or methods that need to be used.

In fact, many problems are best handled with eclectic solutions borrowing from many areas of knowledge. Not only do you need to draw on your statistical knowledge but also on all your accumulated knowledge and experience in life: your understanding of the subject matter of the problem, your creativity with mathematical models, your ability to visualize and communicate, your interpersonal skills in understanding your clients' possible anxieties with statistics, your insight in working with your own anxieties, etc, etc.

The goal of this course is to help you develop the skills and confidence to solve real-world problems. You will learn about the key role of many statistical concepts that are rarely seen in detail in standard courses. You will also learn the vital role of visualization and graphics, communication (listening even more than talking) and presentation skills.

The course will help you develop skills in a number of areas:

  1. programming and data management skills in R: Although the emphasis in this course is entirely on R, many jobs expect a strong knowledge of SAS -- take every opportunity you can to also learn SAS. Consider, if you are a beginner, the courses offered through the Statistical Consulting Service. If you work with clients who use another package, e.g. SPSS or Stata, you might have to learn enough about them to show your clients how to perform their analyses using their own packages. A recent feature in SPSS and SAS allows R to be called from these packages. If you have a solution that is too advanced for SPSS or SAS, you can prepare code in R and provide a client with the ability to perform the analysis from the package that is familiar to them.
  2. graphics to visualize data and models
  3. how to work as a statistical consultant/collaborator in the analysis of scientific problems
  4. developing presentations skills
  5. developing an understanding of the role of statistics as a discipline and as a profession in science and business
  6. acquiring basic concepts and techniques related to the analysis of hierarchical and longitudinal data -- a large proportion of the problems our students deal with in consulting involve problems that can be approached through these techniques
  7. understanding ethical issues related to statistical practice


There is no single textbook this year. Consult and add to our list of useful links and references: Links and References

Course Work

There are 4 components:

  1. Individual student pages (or 'blogs') and contributions to the course wiki:
    Use your course page, which you create as a subpage of the Student pages for the course, to prepare after each class and before the next class:
    1. Sample exam questions: A sample exam question and answer on the material of the previous class.
    2. A posting with links and comments on
      • Statistics in the News
      • Statistical paradoxes and fallacies
      • (optional) Reflections on statistical consultations that you arrange to attend in the SCS. The reflections must be constructed to avoid any violation of privacy of either the client or the consultant whose session you attended.
    3. Questions and comments on group work and class lectures
    4. In addition, throughout the course, contribute questions, answers, comments, reflections, etc. to the various discussion pages for the course, e.g. Statistics: Questions and Answers, Statistics: Questions and Answers, etc. In fact, you can create new discussion pages.
    See more details.
  2. Groups assignments and short presentations
    Four or five group assignments on material covered in the course. Some assignments will also involve short group presentations to the class. These presentations will be timed (typically 15 minutes) and getting the interesting aspects of your message across in a limited time is challenging and requires good preparation and coordination.
    The groups for assignments are created using a quasi-random algorithm. Their composition will be sent to you shortly after the first class.
  3. Group consulting project
    You will work on a major consulting project in which you will collaborate with a real client to produce a deep and probing consulting report and analysis. The project is very likely to involved multilevel models. Students who are interested may opt to work on one of two case studies for possible presentation at conferences in the summer of 2011. The case study team may include students who are not currently enrolled in MATH 6627. Further details will be available soon. The consulting project groups are not necessarily identical to the assignment group.
  4. A final in-class 2-hour exam on the material of the course. If you produce good sample questions, you might find a question very close to yours on the exam.

The weight of components is 30% for individual pages and contributions to the wiki, 30% for group assignments, 30% for the consulting project and 10% for the final exam. Note that your contributions to the wiki in general are as easy to identify as those in your individual course page and are given credit that is similar to contribution on your own course page or, indeed, comments added to other students' course pages.

Course organization

  • In the first few classes we will look at general statistical questions that are important in consulting. The following five or six weeks will be devoted to multilevel models and the final portion of the course will be devoted to discussions and presentations of your projects.
  • You will have the opportunity to organize and present short presentation on assignments starting from the third week of classes.
  • Starting from the first week, you should work on your course pages, generating questions, links, comments, etc., and you should start working with your Assignment Team on the first assignment.
  • The course assumes a working knowledge of R including trellis (lattice) graphics. We can schedule a few special tutorials for anyone who feels the need.
  • The final exam will give you a chance to show how you have reflected on the material of the course. It will consist mainly of 'essay' questions possibly drawn for your own sample questions if they turn out to be good.

Class list and teams

Class list
Number Given Name Family Name email address wiki userid and
course page
assignment team page
1 Yurong (Crystal) Cao yrcao Diaconis
2 Jia Yi (Jessica) Li jyjli Rubin
3 Tianyu (Andy) Li andytli Gray
4 Constance Mara cmara Diaconis
5 Luis Palma luispal Rubin
6 Gurpreet (Preety) Saini gurksn Gray
7 Carrie Smith smithce Gray
8 Bin Sun bsun Diaconis
9 Laura Warren lawarren Rubin
10 Yufeng (Sky) Lin skylin44 Auditor
11 Heather Krause hkrause Auditor
12 Hang Ling (Annie) Wang hangling Diaconis

Week 1: January 5, 2011

Course organization
Participation in SCS seminars
You are welcome to attend SCS (Statistical Consulting Service) weekly meetings which consist of bi-weekly 'staff meetings' and bi-weekly seminars on a statistical topic of interest to statistical consultants. The year we are reading and discussion the book Lance, C.E., & Vandenberg, R.J. (Eds.) (2009). Statistical and methodological myths and urban legends: Doctrine, verity and fable in the organizational and social sciences. New York, NY: Routledge. Meetings take place every Friday at 1 pm in TEL 5082. Please send an e-mail message to Georges Monette to have your name added to the SCS mailing list. Note that SCS also offers short courses some of which might be of interest to you.
Consulting, communication, writing reports 
Statistical consulting environment
Writing reports: Secret of good writing: write so your reader understands you!
Notes on writing reports
Seven basic principles
Not all consulting activities require a formal report. Often a phone call, a verbal report in a face to face meeting, a letter or a memo are the most efficient way of communicating to a client
Interpersonal aspects of statistical consulting: Janice Derr, Statistical Consulting Video
Contributions by Doug Zahn
The role of statistics in society -- understanding evidence 
One of of the greatest challenges in understanding evidence is bridging the gap between observational data and causal inference, i.e. understanding the links between statistical significance and statistical meaning.
Statistics in the news: link to come
Smoking: Observational vs. Experimental data: link to come
The Fundamental Contingency Table of Statistics
  Types of Data
Experimental Observational
Types of Inference Causal Where Fisher wants to be Where we often are
Predictive Problematic but very rare Good for 'prediction' not 'causal inference':
This is the topic of Frank Harrell's Regression Modeling Strategies'
Finding meaning in observational data -- examples
Hans Rosling: Myths about the developing world
Al Gore: An Inconvenient Truth
Peter Donnelly: How juries get fooled by statistics
Statistician Peter Donnelly explores the common mistakes humans make in interpreting statistics, and the devastating impact these errors can have on the outcome of criminal trials.
Piet Groeneboom Lucia de Berk and the amateur statisticians
Andrey Feuerverger: The Lost Tomb of Jesus
A working statistician should be proficient with at least SAS and R. In addition, you may need to know the package(s) typically used by your clients, e.g. SPSS. This course uses R because just about everything is possible (not necessarily easy) in R. Increasingly, new methodologies are first prototyped as R packages. As a statistician, knowing R may give you your best edge even if in your initial contacts with clients or in your first jobs it doesn't necessarily seem so relevant.
Getting started with R
After installing the current version of R, you should install the packages we are likely to use a lot:
      > install.packages("car")  # do this first to get the current version
      > install.packages("spida", repos = "")
      > install.packages("p3d", repos = "")
It is a good idea to use separate project directories for different projects. See Using R with project directories under Windows.
To begin learning R work through Maindonald (2008) Using R for Data Analysis and Graphics
Another excellent tutorial is Christopher Green: R Primer
When you're ready to really plunge into R, work your way through the manual that comes with R. From the R window click on Help|Manuals|An Introduction to R.
Explore graphics in R with a sample script.
Much of the work on the course will require using a wiki. Editing material in a wiki is very intuitive once you get the hang of it. If you have problems or questions, you can post them on Miscellaneous Questions and Answers.
Using a wiki for group assignments: see Editing hints for course assignments

Assignment 0 (individual) and things to do

1. Personal wiki page 
Deadline Friday noon on January 7, 2011 Set up your personal wiki page by adding a link to /Students, and fill in some information about yourself, e.g. where you did your previous studies, your current program, your academic interests, your goals, the software you know how to use, etc.
2. R 
  • Install R on your computer(s), including your laptop if you have one. See R: Getting started
  • See the discussion above on installing additional packages and getting started.
3. Weekly 'blog' 
Deadline high noon on Wednesday, January 12, 2011. Enter the weekly material in your personal wiki page.

Assignment 1 (team)

Deadline for files on wiki: noon, January 19, 2011; Presentations: 7 pm on January 19, 2011

1. Simpson's Paradox 
Simpson's Paradox describes a situation where the direction of the association between two variables, X and Y, changes when conditioning on a third variable, Z. A troubling classical example, based on data obtained from the State of Florida, is discussed in Agresti (1990) "Categorical Data Analysis". In this example, Y is a dichotomous variable indicating whether a prisoner found guilty of murder was sentenced to capital punishment (execution), X is the race (white or black) of the prisoner and Z is the race of the victim. Overall, a lower proportion of blacks than whites are sentenced to be executed but, controlling for the race of the victim, the association is dramatically reversed. Possibly, the judicial process goes easier on blacks because their victims tend to be black. In this example of Simpson's Paradox, X, Y and Z are all dichotomous variables and Z can be thought of as a confounding factor. Although, in principle, the phenomenon is the same whether X, Y or Z are dichotomous or numerical (continuous or discrete) variables, our ability to visualize the phenomenon and to transfer the concept to other situations seems to depend crucially on the nature of the variables. Considering that the role of Z can be that of a confounding factor, a mediating factor or some mixture of both, there are 2 x 2 x 2 x 3 = 24 possible combinations of potential examples illustrating Simpson's Paradox. Find two different ones. They may be just plausible examples or, better, real examples or, best, real examples with real data. Discuss each example and post at least one graph or hand-drawn sketch (use something like Microsoft Paint, save as a .png file and upload to the wiki) for each to help visualize the example. Prepare a 5-minute presentation.
2. Graphics to visualize data 
Explore the graphical possibilities of one of the following packages in R:
  • Rubin: lattice (be sure to include the use of panels and groups)
  • Diaconis: googleVis (the data set should be longitudinal, i.e. include a time variable)
  • Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories
I will assign which package initially goes to what group to make sure they are all covered. You are free to swap between groups, however.
Prepare some files on the wiki that would be useful to help someone else understand how to use and exploit the package and prepare a 10-minute presentation.


Week 2: January 12, 2011


Assignment 2 (team)

Deadline for files on wiki: noon, January 26, 2011; Presentations: (10 minutes max on both questions) 7 pm on January 26, 2011

1: Pick one of the articles published in the Pulse column in the Toronto Star. Discuss whether the article suggest a causal relationship between two variables. If so which? Are the data observational or experimental? Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? Have any confounding factors been accounted for in the analysis? Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? What is your personal assessment of the evidence for causality in the study that is the subject of the article?
Put the title of the article you have chosen on your team page as soon as you choose it so other teams can avoid choosing the same article.
Post some notes on your discussion of the article in a subpage of your team page.
Prepare a maximum 5-minute presentation preferably using materials available on the web linked through the subpage of your team page.
2: Prepare a discussion on 5 of the questions in /Paradoxes and Fallacies - I. Do the questions whose number is equal to T (mod 3) where T the is ordering of your team: 1: Diaconis, 2: Gray, 3: Rubin.
You can work on your discussions in a subpage of your team page and then post them to /Paradoxes and Fallacies - I once you are ready. When you edit /Paradoxes and Fallacies - I, just edit the question you are working on in order to avoid editing collisions if the page is being edited by two people at the same time.

Week 3: January 19, 2011

  • Presentations
  • Continuation of week 2: visualizing multiple regression
  • See links for week 3





Ethics in Statistics and Consulting

TED Talks on Statistics


New York Times, April 8, 1984

Personal tools