MATH 6627 200809 Practicum in Statistical Consulting
From Wiki1
Practicum in Statistical Consulting
 Quick links
 Public wiki: http://wiki.math.yorku.ca/
 Private wiki: http://statswiki.math.yorku.ca/
 Getting started with R: http://wiki.math.yorku.ca/index.php/R:_Getting_started
 Statistical Consulting Service: http://www.yorku.ca/isr/scs/
 Link to a page of useful links for Gelman & Hill
 MATH 6627 200809 Where is the book?
 Reading Course, Summer 2010 (Statistical Visualization)
 NEWS
 NEW Starting Wednesday, March 18 2009, we will meet in Bethune College 202 from at 4 pm to 7 pm.
Contents 
General Information
Instructor
 Georges Monette, Ph.D., P.Stat. (http://www.ssc.ca/accreditation/index_e.html)
 N626 Ross
 mailto:georges@yorku.ca
 http://www.math.yorku.ca/~georges
 Office hours: Thursdays 3 pm to 5 pm and other times by appointment
Meetings
The class will meet every second week on Wednesdays from 7:00 pm to 10:00 pm in Vari Hall 1016. Consult the schedule below for exact dates.
Goals
As undergraduates we learn statistics through a sequence of courses each focusing on some part of statistical theory. When we solve problems in these courses the tools we are expected to use are obvious. When you have to solve realworld statistical problems, it is rare that there are clear clues about the correct theory or method that needs to be used.
In fact, many problems are best handled with eclectic solutions borrowing from many statistical fields. The goal of this course is to help you develop the skills and confidence to solve realworld problems. You will learn about the key role of many statistical concepts that are rarely seen in standard courses. You will also learn the vital role of visualization and graphics, communication (listening even more than talking) and presentation skills.
The course will help you develop skills in a number of areas:
 programming and data management skills in R: Although the emphasis in this course is entirely on R, many jobs expect a strong knowledge of SAS  take every opportunity you can to also learn SAS. Consider, if you are a beginner, the courses offered through the Statistical Consulting Service
 graphics to visualize data and models
 how to work as a statistical consultant/collaborator in the analysis of scientific problems
 developing presentations skills
 developing an understanding of the role of statistics as a discipline and as a profession in science and business
 understanding ethical issues related to statistical practice
Text and references
 Text: Andrew Gelman and Jennifer Hill (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press,
 References (on reserve at Steacie Science Library):
 Javier Cabrera and Andrew McDougall (2001) Statistical Consulting, SpringerVerlag, N.Y. Steacie Reserves HA 29 C227 2002
 Janice Derr (2000) Statistical consulting: a guide to effective communication, Duxbury Steacie Reserves HA 29 D386 2000 Book and CDROM
 Other references: [1] (we will build up the list during the year).
Course Work
 In the first term the work for the course consists primarily of assignments done individually and posted on the 'private wiki' with help from your group.
 Since almost all the interesting consulting problems I have seen in recent years have required an understanding of multilevel models, which are a natural extension of traditional linear models, we will develop ideas and concepts for the application of linear and multilevel models to real research problems by working through the course text. The text uses many real data sets and presents methods of analysis in R.
 In the second term we continue working through the text. In addition, you will work on a major consultation project in which you will collaborate with a real client to produce a deep and probing consulting report. The project is very likely to involved multilevel models. Students who are interested may opt to work on a Statistical Society of Canada (SSC) case study for presentation at the SSC meeting in the spring of 2009. The Case Study team may include students who are not currently enrolled in MATH 6627.
 You will also attend some real statistical consultations and prepare brief reports which will count towards your assignment grade.
 Another important part of the course work is your contribution to the 'public wiki'. In particular we will develop two types of information on the wiki
 how to's in R: these are brief articles describing how to do something simple in R, either a graph, an analysis or a type of data manipulation.
 Paradoxes and fallacies in statistics: As your knowledge of statistics becomes deeper you abandon many simple suppositions and replace them with more sophisticated ones. An important part of communication between statisticians and clients  for that matter between statisticians and the public or between statisticians and students  involves understanding simple, often fallacious, suppositions and how they can lead to a deeper understanding. We will develop wiki pages to discuss important paradoxes and fallacies.
Grading
 Assignments (40%)
 At each class meeting you will be given individual assignments to be completed and posted on the private wiki before the start of the following class (typically two weeks later). For each assignment, you will be assigned to a group that is expected to helps its members with any questions or problems completing the assignment. Groups members may help each other by editing each other's work. You work will be graded and comments will be sent to you from the instructor. You can then improve your work which will be marked again two weeks later. Your grade for each assignment will be the the sum of your grade at the original due data plus the grade on the corrected work two weeks later plus the average of your group's grade on their original due date. The purpose is to encourage group cooperation so each member produces strong work at the first due date. Grades will be based not only on the correctness of the solution but also on the effectiveness and creativity of appropriate graphical presentations. You will receive 8 or 9 out of 10 for work that is correct and complete. To receive 10 out of 10, you must also show extra effort to explain the solution or find an appropriate graphical representation of the solution.
 Consulting Project (40%)
 The grade will be based on the appropriateness and quality of the statistical analyses, their the quality of exposition, the quality of presentation.
 Contributions to the public wiki (10%)
 Class participation (10%)
 This is based on the preparation of question on the assigned readings (to be posted on the wiki), attendance, preparation for class and participation in class discussions.
Class list and teams
Number  Family name  Given name  Week 1  Status  

1  Boulsaien  Khaled  bouls801@yorku.ca  A  
2  Cao  Jianzhe  adamcao@yorku.ca  B  
3  Chane  Parminder  pchane@yorku.ca  C  
4  Gao  Isabel  isabelg@mathstat.yorku.ca  D  
5  Khan  Sabria  sabria@yorku.ca  A  
6  Leeza  Nusrat  nsleeza@yorku.ca  B  
7  Liao  Yang  liaoyang@yorku.ca  C  
X  Meschian  Mehran  mmeschia@yorku.ca  D  dropped 
9  Nabipoor Sanjebad  Majid  masaba@yorku.ca  A  
10  Palma  Luis  luispal@yorku.ca  B  inactive 
11  Shakya  Sulin  sshakya@yorku.ca  C  
12  Shi  Xiaoping  xpshi@yorku.ca  D  
13  Xu  Hong  hongxu@yorku.ca  A  
14  Pope  Chris  u843603@mathstat.yorku.ca  C  
8  Faroque  Shahela  TBD  D  dropped 
Schedule
Week 1: September 10, 2008
 Topics

 Course organization
 Participation in SCS seminars
 You are welcome to attend SCS (Statistical Consulting Service) weekly meetings which consist of biweekly 'staff meetings' and biweekly seminars on a statistical topic of interest to statistical consultants. The exact topic for this year will be determined in two weeks. Meetings take place every Friday at 2:30 in TEL 5082. Please send an email message to Georges Monette to have your name added to the SCS mailing list. Note that SCS also offers short courses some of which might be of interest to you.
 Consulting, communication, writing reports

 Statistical consulting environment
 Writing reports: Secret of good writing: write so your reader understands you!
 Notes on writing reports
 Seven basic principles
 Not all consulting activities require a formal report. Often a phone call, a verbal report in a face to face meeting, a letter or a memo are the most efficient way of communicating to a client
 Communication:
 Interpersonal aspects of statistical consulting: Janice Derr, Statistical Consulting Video
 Contributions by Doug Zahn
 The role of statistics in society  understanding evidence

 One of of the greatest challenges in understanding evidence is bridging the gap between observational data and causal inference, i.e. understanding the links between statistical significance and statistical meaning.
 Statistics in the news: Lies or Statistics
 Smoking: Observational vs. Experimental data: rough notes
Types of Data  

Experimental  Observational  
Types of Inference  Causal  Where Fisher would like us to be  Where we often are 
Predictive  Very rare but problematic  Good for 'prediction' not 'causal inference': This is the topic of Frank Harrell's Regression Modeling Strategies' 
 Finding meaning in observational data  examples
 Hans Rosling: Myths about the developing world
 Al Gore: An Inconvenient Truth
 Peter Donnelly: How juries get fooled by statistics
 Statistician Peter Donnelly explores the common mistakes humans make in interpreting statistics, and the devastating impact these errors can have on the outcome of criminal trials.
 Piet Groeneboom Lucia de Berk and the amateur statisticians
 Andrey Feuerverger: The Lost Tomb of Jesus
 Finding meaning in observational data  examples
 Software

 A working statistician should be proficient with at least SAS and R. This course uses R. A good consultant should also be familiar with packages that are likely to be used by clients, e.g. SPSS.
 Getting started with R
 After installing R, you should install the packages designed for the textbook:
> install.packages("arm") > install.packages("BRugs")
 (From http://www.stat.columbia.edu/~gelman/bugsR/) Set up R in 'single window mode': Click on Edit, then GUI Preferences, then at the top click SDI. Add a couple of zeroes to the "buffer" and "lines" options near the middle of the screen. Then save the preferences.
 Whenever you start R, issue the command:
> library(arm)
 to use the software with the text and issue the command:
> source("http://www.math.yorku.ca/~georges/R/fun.R")
 to use software written for this course.
 It is a good idea to use separate project directories for different projects. See Using R with project directories under Windows.
 To begin learning R work through Maindonald (2008) Using R for Data Analysis and Graphics
 Another excellent tutorial is Christopher Green: R Primer
 When you're ready to really plunge into R, work your way through the manual that comes with R. From the R window click on HelpManualsAn Introduction to R.
 Wikis

 Public wiki
 wiki.math.yorku.ca :
 Open for reading to the world
 Need an account for editing  I will create accounts for all members of the class so you can make contributions to the information on the wiki
 Private wiki
 statswiki.math.yorku.ca :
 Need a userid and password for access
 Once you are in the wiki, you can create your own userid and password. Please use the same userid you have for mail @yorku.ca (e.g. if your York email address is 'maryjones@yorku.ca' then use the userid 'maryjones', the password can be anything you choose. The page in which you create your account says that your real name and email address are optional but you will need to fill this in order to get properly graded for your work. This will avoid name 'collisions' on the private wiki.
 The private wiki will be used for course assignments, course materials, etc.
 Using a wiki for group assignments
Gelman and Hill Chapters 1 and 2
 Web page for the book http://www.stat.columbia.edu/~gelman/arm/
 Data directory: http://www.stat.columbia.edu/~gelman/arm/examples/
 Downloading data and R scripts:
 Start R
 Open a script window: FileNew script
 Use a web browser to to open a data file
 Cut and paste the data file into the R script window and save it with a suitable name: e.g. police.dat
 Open a new script window for commands to read in the data file as an R 'data.frame'
 Count the number of nondata lines to skip at the top, then use the command:
 > dd < read.table( 'police.dat', header = T, skip = 6)
 Submit the command with CtrlR
 Count the number of nondata lines to skip at the top, then use the command:
Assignment 1 and things to do
Deadline: 5 pm, Wednesday, September 24.
 1. Private wiki
 Log in to the 'private wiki' http://statswiki.math.yorku.ca using the password sent to you by email (you are free to change this password). Go to your user page by clicking on your userid at the top of the page and write a few details about yourself, e.g. where you did your previous studies, your academic interests, the software you know how to use, etc. Remember that this material is not accessible to the public but can be viewed by anyone who has access to the statswiki.
 2. R
 Install R on your computer(s), including your laptop if you have one. See R: Getting started
 Install the software that goes with our textbook with the R command install.packages("arm")
 Work through the first two chapters of Maindonald (2008) Using R for Data Analysis and Graphics
 3. Questions on readings for next class
 Read Chapters 1 to 3 of Gelman & Hill and formulate at least one question on Chapters 1 or 2 and one question on Chapter 3. Add them to the questions at MATH 6627 2008 Questions
 4. Statistics in the News
 Find a current or recent topic in the news that involves, explicitly or implicitly, an interesting statistical issue. Prepare an analysis of the topic together with a review of scientific evidence. Are there gaps between the science and the public presentation of the topic?
 5. Class photo
[Deferred to the next class  I forgot to take a photo!] See the class photo at MATH 6627 200809 Class Photo and enter your name for the caption.
Week 1.5: September 17, 2008
 Topic

 This is an optional tutorial on the use of R or the wiki for those who have had little or no experience with either. Be sure to have downloaded R and started covering some of the material in Maindonald (2008) Using R for Data Analysis and Graphics or another tutorial in [2]] before the tutorial. If you have a laptop, install R on it and bring it to the class.
 In this tutorial we will work through:
 the sample session in Venables and Ripley (2002) [3] and
 the tutorial by John Fox prepared for a short course at UCLA: http://socserv.mcmaster.ca/jfox/Courses/UCLA/index.html
 To continue learning R:
 Work through http://cran.rproject.org/doc/manuals/Rintro.html, also available as a pdf file through the help menu on the R console.
 Highly recommended for learning R systematically: work through the online textbook by J. H. Maindondald at http://wiki.math.yorku.ca/index.php/R:_Getting_started#Exploring_much_more_deeply
Week 2: September 24, 2008
Examples of multilevel data
 A First Look at Multilevel and Longitudinal Models (userid:fisher password:cohen)
 Longitudinal Data Analysis with Mixed Models: A Graphical Overview
Fitting and looking at models with R
Assignment 2 and things to do
Deadline: 5 pm, Wednesday, October 22.
 1. R
 Work through chapters 3 and 4 of Maindonald (2008) Using R for Data Analysis and Graphics
 2. Questions on readings for next class
 Read Chapters 4 to 6 of Gelman & Hill and formulate at least one question on each chapter. Add them to the questions at MATH 6627 2008 Questions
 3. Class photo
See the class photo at MATH 6627 200809 Class Photo and enter your name for the caption.
 4. Do your part of the assignment for Week 2. Wherever you can, produce plots showing your fitted models even if not required by the question in the book. When two students work on the same question, work independently. You may look at each other's work but you should do your work with your own group.
Week 3
R script
Visualizing Regression
See Visualizing Regression [4] pp 184 for
 Regression to the mean
 The regression paradox and the regression fallacy
 The geometry and interpretation of the data (or concentration) ellipse: the regression line and the data ellipse
 Visualizing correlation and the confidence interval for the slope using the data ellipse
Notes on Chapter 3: Linear Regression: the basics
 3.2 Multiple predictors
 Interpretation of coefficient β_{i}:
 "expected change in Y when you change X_{i} keeping other X's constant".
 Not always directly meaningful: e.g. a quadratic model:
 E(Y  X) = β_{0} + β_{1}X_{1} + β_{2}X_{2}
 The change in E(Y  X) for a change in X depends on X and is equal to β_{1} + 2β_{2}X. Note that β_{1} is the expected change in Y for a change in X when X = 0. Similar considerations hold for models with interactions, etc.
 Interpretation of coefficient β_{i}:
 Counterfactuals (causal) versus predictive interpretation of β_{i}
 When is each interpretation correct?
 Counterfactuals (causal) versus predictive interpretation of β_{i}
 3.3 Interactions
 See R script for example
 3.4 Statistical inference
 Where does come from?
 With simple regression it's easy:
 which is easier to interpret when written as:
 where σ_{X} is the 'population' standard deviation of X, i.e. the standard deviation using n as a divisor. Compare with . So the information on β is proportional to and inversely proportional to s_{e}.
 For multiple regression the common formula is:
 , where ...
 Much more informative, however, is the formula:
 where is the standard deviation of the residual of X_{k} after regression on all the other regressors.
 The importance of this formula is that it suggests how you might try to improve the estimate of β_{k}. You can increase n or decrease the error of regression or increase the variability in X_{k} keeping other X's constant.
 With simple regression it's easy:
 Where does come from?
 3.5 Graphical display of fitted model
 See R script for alternative approach
 3.6 Assumptions and diagnostics
 If assumptions true the residuals look approximately random from normal distribution and should not show patterns when plotted in various ways. Common diagnostics: study the residuals and plot. GH only mentions the traditional diagnostics of plotting residuals against fitted values and Xs. In addition, there are other plot that have served me very well.
 3.7 Prediction and validation
 Broad and important topic. Statisticians often pay too little attention to validation.
Notes on Chapter 4: Before and after fitting the model
 4.1 Linear transformations
 Standardizing:
 Using zscores
 In passing: simple regression using zscores:
 where r is the correlation.
 Using reasonable centre and scale
 Using zscores
 Standardizing:
 4.2 Centering and standardizing (especially for models with interactions)
 If
 E(Y) = β_{0} + β_{1}X_{1} + β_{2}X_{2} + β_{3}X_{1}X_{2}
 then
 So β_{1} is the 'effect' of X_{1} when X_{2} = 0. If we recenter X_{2} we change the meaning of β_{1} and viceversa. The information on β_{1} is 'maximized' when X_{2} is centered so that . But this is no reason to centre X_{2} at since recentering also changes the meaning of β_{1}.
 If
 4.3 Correlation and regression to the mean
 Four lines: the principal axis (principal component line), the regression of Y on X, the SD line and the regression of X on Y. In zscores:
 Y on X:
 X on Y:
 SD line:
 If s_{y} = s_{x} then the SD line and the principal axis are identical. Otherwise it's more complicated.
 Four lines: the principal axis (principal component line), the regression of Y on X, the SD line and the regression of X on Y. In zscores:
 Exercise:
 In a course in which the final grade is the average of the mark on a midterm and on a final exam (both are graded out of 100), a professor would like to impute the midterm grade of a student who missed the midterm for a legitimate reason. What's the best way? Just use the final grade? Use the predicted midterm grade after doing a regression of the midterm on the final? Impute a zscore for the midterm using the zscore on the final? Use reverse regression by regressing the final on the midterm and imputing the value for the midterm that would predict the student's grade on the final? Use principal axis regression? What are the consequences of using these various methods and which one do you think is best? Examine at least briefly the meaning of best in this context?
 Exercise:
 4.4 Log transformations
 Interpreting β's.
 4.5 Other transformations
 4.6 Building models for prediction (in contrast with causal inference)
 i.e. models that fit well whether parameters have a causal interpretation or not.
 GH omit one important consideration: the number of 'degrees of freedom' should not be too large relative to n. Harrell (2001) discusses this in detail. See sample size and validity.
Some notes on Chapter 5: Logistic regression
Download data from http://www.stat.columbia.edu/~gelman/arm/examples/nes/
To install the data download all three files and run 'nes_chap4.R'
This will create a data set named 'data' with data from elections from 1972 to 2000.
Use:
d92 < data[data$year == 1992,]
to get data on the Bush/Clinton race of 1992.
Assignment 3 and things to do
NEW Deadline: 5 pm, Wednesday, November 12.
 1. Questions on readings for next class
 Read Chapter 7 of Gelman & Hill and formulate at least one question. Add it to the questions at MATH 6627 2008 Questions
 2. Individual assignments
The following assignment should be done individually. Email your work to me by the deadline. You can send a text file, a Word file or a pdf file. If you wish to use some other format, please let me know so I can make sure that I will be able to read it.
 Look at the data set http://www.math.yorku.ca/~georges/Data/coffee.csv. It has three relevant variables, 'Heart', which is a measure of heart condition  the higher the less healthy; 'Coffee', a measure of coffee consumption, and finally, 'Stress', measure of occupational stress. How could you use this data to address the question whether coffee consumption is harmful to the heart.Discuss assumptions needed to get anywhere with the data and discuss the nature of various assumptions that might lead to different interpretations, if relevant.
 Look at the data set http://www.math.yorku.ca/~georges/Data/hwX.csv where X is the remainder when you divide your 'class number' (the number from 1 to 20 on the class list on the web) by 4. Thus X will be 0, 1, 2, or 3. (i.e. if you number is 7 then X is 3 and you would use the data set http://www.math.yorku.ca/~georges/Data/hw3.csv. The data set contains data on three variables: Health (the higher the better), Height and Weight. All are in standardized units. What would this data set have to say about the relationship between Weight and Health? Discuss assumptions needed to get anywhere with the data and discuss the nature of various assumptions that might lead to different interpretations, if relevant.
 Do the exercise in red above on imputing a midterm grade.
 3. Review
 Review your textbooks on multiple regression. What is a confidence ellipse? What is its connection with hypothesis testing? What is a ScheffĂ© confidence interval? What is a Bonferroni confidence interval?
Week 3.5
Here are the 'blackboard' notes.
Week 4
March 4, 2009
News
Assignment 3 was officially due after the start of the strike which means that it wasn't due until now. We will discuss when it ought to be completed.
Plans
The major activity in the course is the analysis of a real data problem. All the data problems I have involve hierarchical or longitudinal data so our priority is to learn enough about the analysis of this kind of data so you can get started on projects by April 1. I propose to meet every week for the next three weeks and then we will reassess our progress.
Visualizing Simple Regression
 [[[:Template:Hmr]]Visualizing_Regression/Visualizing_Regression_IUnivariate.pdf Visualizing Simple Regression]
 MATH 6627 2008 R script: Visualizing Simple Regression
Visualizing Multiple Regression
 [[[:Template:Hmr]]Visualizing_Regression/Visualizing_Regression_IIBivariatev2.pdf Visualizing Multiple Regression] (we reached p. 73 on Day 1)
 MATH 6627 2008 R script: Visualizing Multiple Regression
For next week
 1. Readings for next week
 Read Chapters 9 and 10 of Gelman & Hill (skip 8 unless you wish to read it on your own). These two chapters are on causal inference with observational data. They are challenging but very important for professional statisticians to understand. Formulate at least one question. Add it to the questions at MATH 6627 2008 Questions
 2. Formulate questions on the material we have seen in class this week: MATH 6627 2008 Questions
 3. Finish outstanding assignements.
 4. Something to think about
 Look at the data set http://www.math.yorku.ca/~georges/Data/hs.csv. This data consists of math achievement scores and 'ses' (socioeconomic status) of 1977 in 40 U.S. schools, 21 of which are Catholic and 19 public. A goal in analyzing this data is to describe the relationship between math achievement and ses, and to examine whether the relationship is similar in different school sectors and among boys and girls. Explore the data and think about how one could address these questions. A few specific questions to think about:
 Is a low ses child better off in a high ses school or in a lower ses school? If there is a difference, are we confident that it is the school that makes the difference?
 Is there any evidence that students in boys or girls schools do better than students in coed schools?
 How do public schools compare with Catholic schools?
A brief description of some variables:
 school: a numeric id for each school
 mathach: a math achievement score
 ses: socioeconomic status (education and income of parents)
 Size: size of school
 PRACAD: priority given to academics in a school
 DISCLIM: disciplinary climate
 Minority: hispanic or black
 HIMINTY: high proportion of minorities in school
> hs < read.csv("http://www.math.yorku.ca/~georges/Data/hs.csv") > source("http://www.math.yorku.ca/~georges/R/fun.R") > dim(hs) [1] 1977 13 > library( car ) > some( hs ) X school mathach ses sector female Sex Minority Size Sector PRACAD DISCLIM 176 1003 2458 9.142 0.242 1 1 Female Yes 545 Catholic 0.89 1.484 500 1777 3013 18.846 0.032 0 1 Female No 760 Public 0.56 0.213 680 3022 4292 16.442 0.048 1 0 Male Yes 1328 Catholic 0.76 0.674 880 3705 5619 21.451 0.412 1 1 Female No 1118 Catholic 0.77 1.286 1023 3909 5720 8.259 0.238 1 1 Female No 381 Catholic 0.65 0.352 1064 3950 5720 18.241 1.132 1 0 Male No 381 Catholic 0.65 0.352 1160 4210 6074 12.553 0.042 1 1 Female No 2051 Catholic 0.32 1.018 1178 4228 6074 18.875 0.508 1 1 Female No 2051 Catholic 0.32 1.018 1818 6302 8707 22.102 0.792 0 0 Male No 1133 Public 0.48 1.542 1942 7150 9586 10.626 1.132 1 1 Female No 262 Catholic 1.00 2.416 HIMINTY 176 1 500 0 680 1 880 0 1023 0 1064 0 1160 0 1178 0 1818 0 1942 0 > > tab(size = table(hs$school)) size 29 32 34 35 36 37 38 41 42 44 45 48 49 51 52 1 2 1 1 1 2 1 1 1 1 2 2 1 1 2 53 54 55 56 57 58 59 60 63 64 65 66 Total 5 1 1 2 3 2 1 1 1 1 1 1 40 > tab(~ Sex + school, hs) school Sex 1317 1906 2208 2458 2626 2629 2639 2658 2771 3013 3610 3992 4292 4511 4530 4868 Female 48 27 35 57 18 0 24 27 28 19 29 21 0 58 63 11 Male 0 26 25 0 20 57 18 18 27 34 35 32 65 0 0 23 Total 48 53 60 57 38 57 42 45 55 53 64 53 65 58 63 34 school Sex 5619 5640 5650 5720 5761 5762 6074 6484 6897 7172 7232 7342 7345 7688 7697 7890 Female 30 24 32 24 52 21 56 20 29 22 30 0 29 0 11 24 Male 36 33 13 29 0 16 0 15 20 22 22 58 27 54 21 27 Total 66 57 45 53 52 37 56 35 49 44 52 58 56 54 32 51 school Sex 7919 8531 8627 8707 8854 8874 9550 9586 Total Female 16 23 24 26 17 21 19 59 1074 Male 21 18 29 22 15 15 10 0 903 Total 37 41 53 48 32 36 29 59 1977 > tab( ~Sector, up(hs, ~school)) Sector Catholic Public Total 21 19 40
Week 5
March 11, 2009
Links to course materials
For next week
 1. Readings for next week
 Read Chapters 11 and 12 of Gelman & Hill. Formulate at least one question. Add it to the questions at MATH 6627 2008 Questions
 2. Start working on the following individual assignment due April 1:
 Using the full high school data set at http://www.math.yorku.ca/~georges/Data/hsfull.csv address the following questions:
 1) Describe the relationship between math achievement and SES. How does it seem to vary between school sectors, between girls and boys?
 2a) In what kind of school does a 'poor' girl (ses = 1) seem to be better off? Would she be better off in a school with relatively low mean SES or a school with relatively high SES, a Catholic or a public school, a girls school or a mixed school?
 2b) Do question 2a with 'poor' replaced with 'rich' (ses = 1).
 2c) Do question 2a with 'girl' replaced with 'boy'.
 2d) De question 2b with 'girl' repalced with 'boy'.
 Compare the 'effect' of SES among boys in each combination of contexts: public, Catholic, poor school, rich school, girls, boys or mixed schools.
 Compare the 'effect' of SES among girls in each combination of contexts: public, Catholic, poor school, rich school, girls, boys or mixed schools.
 Using the full high school data set at http://www.math.yorku.ca/~georges/Data/hsfull.csv address the following questions:
Week 6
March 18, 2009
 Course materials for this week
Hierarchical Models Part I
 [[[:Template:Hmr]]Hierarchical_Models_I/Hierarchical_Models_I_v2.pdf Hierarchical Models Part I version 2, (reasonably clean)]
 [[[:Template:Hmr]]Hierarchical_Models_I/PartI.R R scrips for Hierarchical Models Part I]
Hierarchical Models Part I (in progress)
 [[[:Template:Hmr]]Hierarchical_Models_I/Hierarchical_Models_I_v3_CURRENT_DRAFT.pdf Hierarchical Models Part I version 3, (still a mess)]
 [[[:Template:Hmr]]Hierarchical_Models_I/PartIb.R R scrips for Hierarchical Models Part I(b) (in progress)]
Data
For next week
 1. No new readings. Consolidate previous readings.
 2. Last week's assignment deadline is extended by 1 week to April 1.
Weeks 7 & 8
March 25 and April 1, 2009
 Course materials
Hierarchical Models Part II
 [[[:Template:Hmr]]Hierarchical_Models_II/WorkshopLongitudinal_with_R2009_03_25.pdf Longitudinal Data Analysis with R]
 [[[:Template:Hmr]]Hierarchical_Models_II/TalkOnComasAndMigraines.pdf Nonlinear mixed models and generalized linear mixed models]
 [[[:Template:Hmr]]Hierarchical_Models_II/Sample_Analysis.R R script for a sample analysis]
 [[[:Template:Hmr]]Hierarchical_Models_II/Longitudinal_Data_Analysis_with_Mixed_Models_using_R_Concluding_Thoughts.pdf A few comments]
Splines
For next week
 The assignment that was due April 1 is now due April 8. Preferably mail me a pdf or Word file.
Week 9
Here's the script we wrote in class: MATH6627 Sample analysis 2009 04 22
Links
Organizations
 Statistical Society of Canada:
 Statistics Canada
 United Nations Statistical Commission http://unstats.un.org/unsd/default.htm
Consulting
 D.R. Cox on statistical consulting: http://www.ssc.ca/resources/consultants/cox_e.html
 ASA Consulting Section: http://www.amstat.org/sections/cnsl/index.html
 UBC web page for its consulting course http://www.stat.ubc.ca/Courses/Details/course.php?course=65
 Statistical Consulting at Acadia http://ace.acadiau.ca/math/m4233/m4233.htm
TED Talks on Statistics
 Peter Donnelly on genomes: http://www.youtube.com/watch?v=kLmzxmRcUTo&mode=related&search=
 Hans Rosling (2006) Debunking thirdworld myths with the best stats you've ever seen http://www.ted.com/talks/view/id/92
 Hans Rosling (2007) on New insights on poverty and life around the world: http://www.ted.com/index.php/talks/view/id/140
Other
 Rod Little on English style in scientific papers: http://sitemaker.umich.edu/rlittle/files/styletips.pdf
 Writing Reports: Seven Basic Principles
 Audio and slides of Workshop on Current Issues in the Analysis of Incomplete Longitudinal Data (October 1315, 2005)at the Fields Institute: http://www.fields.utoronto.ca/audio/#CMM
 Steve's Attempt to Teach Statistics A very interesting site.
 Hugh Chipman's introduction to R http://ace.acadiau.ca/math/scc/workshops_2005/Rclass.html
 Constructionism and Reductionism: Two Approaches to ProblemSolving and Their Implications for Reform of Statistics and Mathematics Curricula http://www.amstat.org/publications/jse/secure/v7n2/lazaridis.cfm
 A Medical Mystery Unfolds in Minnesota, New York Times, Feb. 5, 2008.
 An article on the Grange inquiry: TORONTO INFANT DEATH STIRS CONCERN By Douglas Martin,
New York Times, April 8, 1984
 Gerard E. Dallal, "The Little Handbook of Statistical Practice"
 Links to Statistics Notes in the British Medical Journal
 Andrew Gelman's demonstration of multilevel modeling in an applied setting http://www.youtube.com/watch?v=5JYiJwDob1w
 H. Dean Johnson and Dennis A. Warner(2004 Factors Relating to the Degree to Which Statistical Consulting Clients Deem Their Consulting Experience to be a success, The American Statistician, Vol. 58, No. 4 (Nov., 2004), pp. 280289