http://scs.math.yorku.ca/index.php?title=Special:Contributions/Andytli&feed=atom&limit=50&target=Andytli&year=&month=Wiki1 - User contributions [en]2020-07-12T16:56:01ZFrom Wiki1MediaWiki 1.16.1http://scs.math.yorku.ca/index.php/File:2.PNGFile:2.PNG2011-04-04T22:19:40Z<p>Andytli: </p>
<hr />
<div></div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-04-04T22:19:24Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
=== Week 9 ===<br />
Q: True or False for the statements below:<br />
<br />
a)Autocorrelation could be negative.<br />
<br />
b)Strong positive autocorrelation can be a symptom of lack of fit.<br />
<br />
c)Occasional large measurement errors will contribute positively to the estimate of autocorrelation.<br />
<br />
d)Autocorrelation of a random process describes the correlation between values of the process at different points in time.<br />
<br />
A: T T F T<br />
<br />
=== Week 10 ===<br />
Q: Why we say HLM is superior to OLS?<br />
<br />
A: It is because HLM theoretically produces appropriate error terms that control for potential dependency due to nesting effects, while OLS does not.<br />
<br />
Additional argument favoring the use of HLM is that it is a generalization of OLS, which better handles continuous variables that reflect randomized effect designs, and, therefore, HLM produces more accurate error terms and Type I error rates.<br />
<br />
A good part of the cited advantages for HLM is related to the situations in which the intraclass correlations, which is the between group effect divided by the total effect. If the correlation is zero, there seems to be less advantage to using HLM because there is no interclass correlation.<br />
<br />
<br />
=== Week 11 ===<br />
Q: Please note the parameters of an asymptotic model in the figure below<br />
[[File:1.PNG]]<br />
<br />
A: [[File:2.PNG]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
=== Week 9 ===<br />
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]<br />
<br />
=== Week 10 ===<br />
Simpson's Paradox in sports[http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/ Explaining the Simpson Paradox]<br />
<br />
=== Week 11 ===<br />
“Statistics are like a bikini – what is revealed is interesting but what is hidden is crucial. “ What??? Look into this[http://searchengineland.com/how-to-make-your-statistics-skimpy-yet-meaningful-68740 How To Make Your Statistics Skimpy Yet Meaningful]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.<br />
<br />
* -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - [[../Crystal Cao|Crystal]]<br />
<br />
=== Week 9 ===<br />
What is the difference of panel data, longitudinal data,time series data and cross-sectional data?<br />
<br />
=== Week 10 ===<br />
What is the difference of dropout, missing data and censored data? What is informative dropout?<br />
<br />
=== Week 11 ===<br />
This is the last post and it's time to say thanks to everyone in the class. Let's keep in touch. Good luck.</div>Andytlihttp://scs.math.yorku.ca/index.php/File:1.PNGFile:1.PNG2011-04-04T22:17:34Z<p>Andytli: </p>
<hr />
<div></div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-04-04T22:15:41Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
=== Week 9 ===<br />
Q: True or False for the statements below:<br />
<br />
a)Autocorrelation could be negative.<br />
<br />
b)Strong positive autocorrelation can be a symptom of lack of fit.<br />
<br />
c)Occasional large measurement errors will contribute positively to the estimate of autocorrelation.<br />
<br />
d)Autocorrelation of a random process describes the correlation between values of the process at different points in time.<br />
<br />
A: T T F T<br />
<br />
=== Week 10 ===<br />
Q: Why we say HLM is superior to OLS?<br />
<br />
A: It is because HLM theoretically produces appropriate error terms that control for potential dependency due to nesting effects, while OLS does not.<br />
<br />
Additional argument favoring the use of HLM is that it is a generalization of OLS, which better handles continuous variables that reflect randomized effect designs, and, therefore, HLM produces more accurate error terms and Type I error rates.<br />
<br />
A good part of the cited advantages for HLM is related to the situations in which the intraclass correlations, which is the between group effect divided by the total effect. If the correlation is zero, there seems to be less advantage to using HLM because there is no interclass correlation.<br />
<br />
<br />
=== Week 11 ===<br />
Q: Please note the parameters of an asymptotic model in the figure below<br />
[[File:1.jpg]]<br />
<br />
A: [[File:2.jpg]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
=== Week 9 ===<br />
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]<br />
<br />
=== Week 10 ===<br />
Simpson's Paradox in sports[http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/ Explaining the Simpson Paradox]<br />
<br />
=== Week 11 ===<br />
“Statistics are like a bikini – what is revealed is interesting but what is hidden is crucial. “ What??? Look into this[http://searchengineland.com/how-to-make-your-statistics-skimpy-yet-meaningful-68740 How To Make Your Statistics Skimpy Yet Meaningful]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.<br />
<br />
* -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - [[../Crystal Cao|Crystal]]<br />
<br />
=== Week 9 ===<br />
What is the difference of panel data, longitudinal data,time series data and cross-sectional data?<br />
<br />
=== Week 10 ===<br />
What is the difference of dropout, missing data and censored data? What is informative dropout?<br />
<br />
=== Week 11 ===<br />
This is the last post and it's time to say thanks to everyone in the class. Let's keep in touch. Good luck.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-29T02:07:44Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
=== Week 9 ===<br />
Q: True or False for the statements below:<br />
<br />
a)Autocorrelation could be negative.<br />
<br />
b)Strong positive autocorrelation can be a symptom of lack of fit.<br />
<br />
c)Occasional large measurement errors will contribute positively to the estimate of autocorrelation.<br />
<br />
d)Autocorrelation of a random process describes the correlation between values of the process at different points in time.<br />
<br />
A: T T F T<br />
<br />
=== Week 10 ===<br />
Q: Why we say HLM is superior to OLS?<br />
<br />
A: It is because HLM theoretically produces appropriate error terms that control for potential dependency due to nesting effects, while OLS does not.<br />
<br />
Additional argument favoring the use of HLM is that it is a generalization of OLS, which better handles continuous variables that reflect randomized effect designs, and, therefore, HLM produces more accurate error terms and Type I error rates.<br />
<br />
A good part of the cited advantages for HLM is related to the situations in which the intraclass correlations, which is the between group effect divided by the total effect. If the correlation is zero, there seems to be less advantage to using HLM because there is no interclass correlation.<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
=== Week 9 ===<br />
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]<br />
<br />
=== Week 10 ===<br />
Simpson's Paradox in sports[http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/ Explaining the Simpson Paradox]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.<br />
<br />
* -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - [[../Crystal Cao|Crystal]]<br />
<br />
=== Week 9 ===<br />
What is the difference of panel data, longitudinal data,time series data and cross-sectional data?<br />
<br />
=== Week 10 ===<br />
What is the difference of dropout, missing data and censored data? What is informative dropout?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-22T03:14:33Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
=== Week 9 ===<br />
Q: True or False for the statements below:<br />
<br />
a)Autocorrelation could be negative.<br />
<br />
b)Strong positive autocorrelation can be a symptom of lack of fit.<br />
<br />
c)Occasional large measurement errors will contribute positively to the estimate of autocorrelation.<br />
<br />
d)Autocorrelation of a random process describes the correlation between values of the process at different points in time.<br />
<br />
A: T T F T<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
=== Week 9 ===<br />
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.<br />
<br />
* -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - [[../Crystal Cao|Crystal]]<br />
<br />
=== Week 9 ===<br />
What is the difference of panel data, longitudinal data,time series data and cross-sectional data?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-22T03:13:59Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
=== Week 9 ===<br />
Q: True or False for the statement below:<br />
<br />
a)Autocorrelation could be negative.<br />
<br />
b)Strong positive autocorrelation can be a symptom of lack of fit.<br />
<br />
c)Occasional large measurement errors will contribute positively to the estimate of autocorrelation.<br />
<br />
d)Autocorrelation of a random process describes the correlation between values of the process at different points in time.<br />
<br />
A: T T F T<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
=== Week 9 ===<br />
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.<br />
<br />
* -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - [[../Crystal Cao|Crystal]]<br />
<br />
=== Week 9 ===<br />
What is the difference of panel data, longitudinal data,time series data and cross-sectional data?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-22T03:11:10Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
=== Week 9 ===<br />
Q: True or False for the statement below:<br />
a)Autocorrelation could be negative.<br />
b)Strong positive autocorrelation can be a symptom of lack of fit.<br />
c)Occasional large measurement errors will contribute positively to the estimate of autocorrelation.<br />
d)Autocorrelation of a random process describes the correlation between values of the process at different points in time.<br />
<br />
A: T T F T<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
=== Week 9 ===<br />
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.<br />
<br />
* -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - [[../Crystal Cao|Crystal]]<br />
<br />
=== Week 9 ===<br />
What is the difference of panel data, longitudinal data,Time series data and cross-sectional data?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-21T15:42:29Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
=== Week 9 ===<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
=== Week 9 ===<br />
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.<br />
<br />
* -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - [[../Crystal Cao|Crystal]]<br />
<br />
=== Week 9 ===</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-21T15:40:41Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
=== Week 9 ===<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
=== Week 9 ===<br />
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic <br />
Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.<br />
<br />
* -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - [[../Crystal Cao|Crystal]]<br />
<br />
=== Week 9 ===</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-15T15:14:37Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
=== Week 8 ===<br />
Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.<br />
A: Actually, there are many ways to do. In the example, Prof gave one as following:<br />
We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
<br />
=== Week 8 ===<br />
Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. [http://www.ontrackdatarecovery.co.uk/data-recovery-news/articles/data-management-system-aids-japan-relief-effort540.aspx Data management system aids Japan relief effort] <br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?<br />
<br />
* I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - [[../Constance Mara|Constance]]<br />
<br />
=== Week 8 ===<br />
What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-03-15T14:28:46Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG|150px]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG|150px]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG|150px]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|150px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png|150px]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|150px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|150px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|150px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|150px]]<br />
[[File:Gray_p3d_ex1.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png|150px]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png|150px]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png|150px]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png|150px]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?" [http://journals.lww.com/clinicalpain/Abstract/2003/07000/Spousal_Responses_Are_Differentially_Associated.4.aspx Journal Abstract]<br />
<br />
Overly supportive spouses are not necessarily doing their partners a favour.<br />
<br />
''They could be prolonging the recovery of their injured spouses.''<br />
<br />
Men with highly attentive spouses reported higher levels of pain and more disability but did well on physical functions tests.<br />
<br />
Women with highly attentive spouses didn't report feeling more pain or being more disabled. However, they performed more poorly on physical function tests than did women with less attentive spouses<br />
<br />
[[MEDIA: Gray_RoyalPain.pdf]]<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Yes, the article did suggest a causal relationship. One variable is the spousal solicitousness (attentiveness & support). Another is the degree of reported pain and disability. A third is actual physical function. Patient gender was also taken into consideration.<br />
<br />
According to the report, men with chronic pain report more perceived pain and disability when they receive higher levels of spousal attention, and it is implied this is controlling for actual physical ability. For women there was no difference in self-reports of pain and disability with level of spousal support, but women who received more attention from their husbands had poorer physical function than those who did not.<br />
<br />
The data are observational, because there is no manipulation or randomization in collecting the data. <br />
<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. Then he may ask for more LOVEs by reporting hurt. That is a sweet answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as an affectionate payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
Emotional (men) or Physical (women) troubles outside the primary cause of pain. Men who are emotionally unstable may attract and encourage attentiveness in their wives, and also report a disproportionately high level of disability relative to their level of function. Women who are physically weak prior to injury may attract husbands who are highly supportive, but also be more susceptible to injuries that result in chronic pain. <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
This does not appear to be the case.<br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
There is a good case to be made for both forward and backward causality in this study. In addition it would be nice to be able to ontrol for emotional and physical stability prior to chronic injury.<br />
Are frequencies distributed as expected? Are the proportions of highly attentive spouses equal across groups?<br />
ASSESSMENT: Judgement withheld pending further evidence<br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
True. It is more important to accurately measure Coffee Consumption if this is the variable of primary interest in the study. <br />
<br />
Consider this from the analysis of variance framework. When measurement error is introduced, the variability in the outcome accounted for by that factor is decreased, and the error sums of squares increases. If we decrease accuracy in measuring Stress, and the error term will accordingly increase, which will in turn decrease the power to detect the effect of Coffee on Health somewhat, but will not affect the coefficient estimates by much. However, if we have increased measurement error in Coffee Consumption, then the proportion of variability explained by Coffee will decrease AND the error term will increase. We lose power on two fronts!<br />
<br />
Here is a short r-script that demonstrates this idea: [[file: Gray_Q2MeasError.r]]<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
No. The way to do this is starting with 3 variables in the model, and dropping the least "significant", one at a time, until you are left with only "significant" variables. We can perform a stepwise selection, but drop variables which become no longer "significant" after introduction of new variables. <br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
No. After dropping the predictor, model changes. If we drop a predictor whose effect is not significant, the p-values of other predictors will change. We can see this from [http://www.utdallas.edu/~wiorkow%20/documents/Mod1Lect4Regnew.doc this example @ page 26-28] <br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
No. Size of tumors does provide some info about two treatments, one is the best and another is the worse. But it is not a good idea to estimate the difference between the two treatments by using the difference of size of tumors. It is so biased, because there are other factors which will effect the outcome, such as time in the disease. For example, our target is New York when we drive out of Toronto. But some day, we find everyone around is speaking Spanish. But the answer may be yes too, as Barack Obama's slogan showed: "Yes we can". <br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
<br />
False. An interaction exists if the effect of one independent on the dependent variable varies over another independent variable. This tells us nothing about the relationship between the IVs.<br />
An interaction between two variables can occur whether or not the variables are correlated with one another.<br />
<br />
{| class="wikitable"<br />
|-<br />
! <br />
! No Interaction<br />
! Interaction<br />
|-<br />
| '''No Correlation'''<br />
| [[File:Gray_Q14_nCnI.gif]] [[File:X1X2_Uncorr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_nCI.gif]] [[File:X1X2_Uncorr_Int.png|200px]]<br />
|-<br />
| '''Correlation'''<br />
| [[File:Gray_Q14_CnI.gif]] [[File:X1X2_Corr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_CI.gif]] [[File:X1X2_Corr_Int.png|200px]]<br />
|}<br />
<br />
[[File: Gray_Q14.R]]<br />
<br />
Here is an additional example using binary variables:<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]<br />
<br />
== Assignment 3==<br />
<br />
1) As we saw, 'Sector' appears to be an important predictor. Consider models using ses and Sector. Aim to estimate the between Sector<br />
gap as a function of ses if there is an interaction between Sector and ses. Check for and provide for a possible contextual effect of <br />
ses. Plot expected math achievement in each sector. Plot the gap with SEs. Consider the possibility that the apparently flatter effect<br />
of ses in Catholic school could be due to a non-linear effect of ses. How would you test whether this is a reasonable alternative explanation?<br />
<br />
2) Take the example further by incorporating Sex. Consider the the 'contextual effect' of Sex which is school sex composition. Note that<br />
there are three types of schools: Girls, Boys and Coed schools. If you consider an interaction between Sector and school gender composition,<br />
you will see that the Public Sector only has Coed schools. What is the consequence of this fact for modelling sex composition and Sector effects.<br />
<br />
3) Does it appear that boys are better off in a boy's school and girls in a girl's school or are they better off in coed schools? How would<br />
you qualify your findings so parents don't misinterpret them in making decisions for their children?<br />
<br />
4) Is a low ses child better off in a high ses school or are they better off in a school of a similar ses? How about a high ses child in a<br />
low ses school? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?<br />
<br />
5) Is a minority status child better off in a school with a higher proportion of minority status children or are they better off in a school<br />
with a low proportion? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-03-15T14:26:49Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG|150px]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG|150px]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG|150px]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|150px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png|150px]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|150px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|150px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|150px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|150px]]<br />
[[File:Gray_p3d_ex1.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png|150px]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png|150px]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png|150px]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png|150px]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?" [http://journals.lww.com/clinicalpain/Abstract/2003/07000/Spousal_Responses_Are_Differentially_Associated.4.aspx Journal Abstract]<br />
<br />
Overly supportive spouses are not necessarily doing their partners a favour.<br />
<br />
''They could be prolonging the recovery of their injured spouses.''<br />
<br />
Men with highly attentive spouses reported higher levels of pain and more disability but did well on physical functions tests.<br />
<br />
Women with highly attentive spouses didn't report feeling more pain or being more disabled. However, they performed more poorly on physical function tests than did women with less attentive spouses<br />
<br />
[[MEDIA: Gray_RoyalPain.pdf]]<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Yes, the article did suggest a causal relationship. One variable is the spousal solicitousness (attentiveness & support). Another is the degree of reported pain and disability. A third is actual physical function. Patient gender was also taken into consideration.<br />
<br />
According to the report, men with chronic pain report more perceived pain and disability when they receive higher levels of spousal attention, and it is implied this is controlling for actual physical ability. For women there was no difference in self-reports of pain and disability with level of spousal support, but women who received more attention from their husbands had poorer physical function than those who did not.<br />
<br />
The data are observational, because there is no manipulation or randomization in collecting the data. <br />
<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. Then he may ask for more LOVEs by reporting hurt. That is a sweet answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as an affectionate payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
Emotional (men) or Physical (women) troubles outside the primary cause of pain. Men who are emotionally unstable may attract and encourage attentiveness in their wives, and also report a disproportionately high level of disability relative to their level of function. Women who are physically weak prior to injury may attract husbands who are highly supportive, but also be more susceptible to injuries that result in chronic pain. <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
This does not appear to be the case.<br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
There is a good case to be made for both forward and backward causality in this study. In addition it would be nice to be able to ontrol for emotional and physical stability prior to chronic injury.<br />
Are frequencies distributed as expected? Are the proportions of highly attentive spouses equal across groups?<br />
ASSESSMENT: Judgement withheld pending further evidence<br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
True. It is more important to accurately measure Coffee Consumption if this is the variable of primary interest in the study. <br />
<br />
Consider this from the analysis of variance framework. When measurement error is introduced, the variability in the outcome accounted for by that factor is decreased, and the error sums of squares increases. If we decrease accuracy in measuring Stress, and the error term will accordingly increase, which will in turn decrease the power to detect the effect of Coffee on Health somewhat, but will not affect the coefficient estimates by much. However, if we have increased measurement error in Coffee Consumption, then the proportion of variability explained by Coffee will decrease AND the error term will increase. We lose power on two fronts!<br />
<br />
Here is a short r-script that demonstrates this idea: [[file: Gray_Q2MeasError.r]]<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
No. The way to do this is starting with 3 variables in the model, and dropping the least "significant", one at a time, until you are left with only "significant" variables. We can perform a stepwise selection, but drop variables which become no longer "significant" after introduction of new variables. <br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
No. After dropping the predictor, model changes. If we drop a predictor whose effect is not significant, the p-values of other predictors will change. We can see this from [http://www.utdallas.edu/~wiorkow%20/documents/Mod1Lect4Regnew.doc this example @ page 26-28] <br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
No. Size of tumors does provide some info about two treatments, one is the best and another is the worse. But it is not a good idea to estimate the difference between the two treatments by using the difference of size of tumors. It is so biased, because there are other factors which will effect the outcome, such as time in the disease. For example, our target is New York when we drive out of Toronto. But some day, we find everyone around is speaking Spanish. But the answer may be yes too, as Barack Obama's slogan showed: "Yes we can". <br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
<br />
False. An interaction exists if the effect of one independent on the dependent variable varies over another independent variable. This tells us nothing about the relationship between the IVs.<br />
An interaction between two variables can occur whether or not the variables are correlated with one another.<br />
<br />
{| class="wikitable"<br />
|-<br />
! <br />
! No Interaction<br />
! Interaction<br />
|-<br />
| '''No Correlation'''<br />
| [[File:Gray_Q14_nCnI.gif]] [[File:X1X2_Uncorr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_nCI.gif]] [[File:X1X2_Uncorr_Int.png|200px]]<br />
|-<br />
| '''Correlation'''<br />
| [[File:Gray_Q14_CnI.gif]] [[File:X1X2_Corr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_CI.gif]] [[File:X1X2_Corr_Int.png|200px]]<br />
|}<br />
<br />
[[File: Gray_Q14.R]]<br />
<br />
Here is an additional example using binary variables:<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]<br />
<br />
== Assignment 3==<br />
<br />
1) As we saw, 'Sector' appears to be an important predictor. Consider models using ses and Sector. Aim to estimate the between Sector gap as a function of ses if there is an interaction between Sector and ses. Check for and provide for a possible contextual effect of ses. Plot expected math achievement in each sector. Plot the gap with SEs. Consider the possibility that the apparently flatter effect of ses in Catholic school could be due to a non-linear effect of ses. How would you test whether this is a reasonable alternative explanation?<br />
<br />
2) Take the example further by incorporating Sex. Consider the the 'contextual effect' of Sex which is school sex composition. Note that there are three types of schools: Girls, Boys and Coed schools. If you consider an interaction between Sector and school gender composition, you will see that the Public Sector only has Coed schools. What is the consequence of this fact for modelling sex composition and Sector effects.<br />
<br />
3) Does it appear that boys are better off in a boy's school and girls in a girl's school or are they better off in coed schools? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?<br />
<br />
4) Is a low ses child better off in a high ses school or are they better off in a school of a similar ses? How about a high ses child in a low ses school? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?<br />
<br />
5) Is a minority status child better off in a school with a higher proportion of minority status children or are they better off in a school with a low proportion? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-05T23:04:01Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-05T23:03:19Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: <br />
1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
4. Multivariate Normality: Multivariate normality is present in the data. <br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-03-05T23:01:57Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
=== Week 6 ===<br />
Q: Filling the blankets: <br />
MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.<br />
<br />
A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal <br />
<br />
=== Week 7 ===<br />
Q: What are the assumptions of MANOVA?<br />
<br />
A: 1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.<br />
2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.<br />
3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.<br />
4. Multivariate Normality: Multivariate normality is present in the data.<br />
5. Multivariate Homogeneity of Variance: Variance between groups is equal.<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
=== Week 7 ===<br />
Crime stats in Canada: [http://fullcomment.nationalpost.com/2011/02/28/brian-lee-crowley-crime-stats-change-based-on-who-counts-them/ Brian Lee Crowley: Crime stats change based on who counts them]<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]<br />
<br />
=== Week 6 ===<br />
What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)? <br />
<br />
=== Week 7 ===<br />
What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-16T00:50:37Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===<br />
There is a slides for outlier detection. Hope it is helpful. [http://webdocs.cs.ualberta.ca/~zaiane/courses/cau/slides/cau-Lecture7.pdf slides]</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-16T00:42:04Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
<br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-16T00:41:30Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. Add more point to get the idea of the rule.<br />
<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-16T00:40:13Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like? <br />
A: Go to [http://illuminations.nctm.org/LessonDetail.aspx?ID=L454#nextone this web page] and try it. <br />
<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-13T02:21:49Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. [http://www.thestar.com/news/article/933200--toronto-man-cracked-the-code-to-scratch-lottery-tickets?bn=1 Toronto man cracked the code to scratch-lottery tickets] <br />
<br />
=== Week 6 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-05T18:08:06Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
=== Week 5 ===<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
=== Week 5 ===<br />
[http://www.thestarphoenix.com/technology/City+growing+quickly/4222880/story.html City growing quickly - Saskatoon outpaces all metro areas in Canada] You can choose to find a job there and they have a well known immigration program for international students. <br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.<br />
<br />
=== Week 5 ===</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-02T16:11:14Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
A: Correlation: a b f g ; interaction: c e<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessable online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the suvery itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-02T16:03:05Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
<br />
a) people with more years in jail tend to have fewer years of education.<br />
<br />
b) The more money I save, the more financially secure I feel. <br />
<br />
c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.<br />
<br />
d) Helping others in Stats lab is one of our TA job. Another type is grader. <br />
<br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
<br />
f) The more years of education I complete, the higher my earning potential.<br />
<br />
g) Professors drink too many coffee, because their work is stressful.<br />
<br />
<br />
Read more: http://www.answers.com/topic/positive-correlation-1#ixzz1Cdz9COBc<br />
<br />
Read more: http://wiki.answers.com/Q/What_is_an_example_of_negative_correlation#ixzz1CdvYoL95<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessable online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the suvery itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-02-02T16:00:09Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
a) people with more years in jail tend to have fewer years of education.<br />
b) The more money I save, the more financially secure I feel. <br />
c) To give a good taste, she squeezed a lemon into her black tea and stired the tea.<br />
d) Helping others in Stats lab is one of our TA job. Another type is grador <br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
f) The more years of education I complete, the higher my earning potential.<br />
g) Proffessors drink too many coffee, because their work is stressful.<br />
<br />
<br />
Read more: http://www.answers.com/topic/positive-correlation-1#ixzz1Cdz9COBc<br />
<br />
Read more: http://wiki.answers.com/Q/What_is_an_example_of_negative_correlation#ixzz1CdvYoL95<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessable online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the suvery itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.<br />
<br />
=== Week 4 ===<br />
Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?<br />
<br />
Answer: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Carrie_SmithMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Carrie Smith2011-02-02T15:57:46Z<p>Andytli: </p>
<hr />
<div>==About Me==<br />
This year I began working towards my PhD in Psychology in the Quantitative Methods Area. I completed my Masters degree at York studying depth perception (within the Centre for Vision Research) and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background. I have been attending SCS meetings since September and would like to consult in the coming academic year.<br />
<br />
I have experience working with R, Matlab, SPSS and SAS.<br />
<br />
And a proud member of [[MATH 6627 2010-11 Practicum in Statistical Consulting/Assignment_Teams/Gray|Team Gray]]<br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
The cigarette data discussed in class indicated positive relationship between smoking and life expectancy. Ample evidence exists that smoking causes life expectancy to decrease, so there is likely more going on. Beyond causality, what are some important alternative possibilities that may complicate interpretation of the results of regression from observational data?<br />
(1) There may exist mediating variables<br />
(2) There may exist confounding factors<br />
(3) The sample employed might not a good representation of the population, either due to choice of participants or by using a small sample<br />
(4) A linear regression may not be capturing the true pattern<br />
(5) The dependent variable may in fact be causing the independent variable<br />
(6) Selection bias<br />
<br />
=== Week 2 ===<br />
How, by sketching a few lines on this graph, may we satisfy ourselves that the correlation between education and prestige is significant? Explain why. (ps. thanks for lending me your image [[../Andy Li|Andy]]!<br><br />
[[File:D34.jpg|200px]]<br />
<br />
=== Week 3 ===<br />
A researcher studying a schizophrenia medication in a clinical population discovers that the dosage is positively correlated with strength of symptoms. She is about to begin a recall because the drug appears to be making patients worse, when it occurs to her that perhaps there is another variable in play which restores the good name of her drug. What might that variable be? How could this variable have this effect (sketch!) and would you describe it as a 'confounding' or 'mediating' variable?<br />
<br />
=== Week 4 ===<br />
<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
Researchers from Newcastle University conducted a study in which signs with pictures of staring eyes were posted in a busy little cafeteria to see whether the images would encourage patrons to act responsibly and tidy up after themselves. I came across this study via an article titled [http://www.wired.com/wiredscience/2010/12/eyes-good-behavior/#more-44615 "Fake Watchful Eyes Discourage Naughty Behavior"] on Wired Magazine's website. <br />
<br />
The Wired article reports that, "The number of people who paid attention to the sign, and cleaned up after their meal, doubled when confronted with a pair of gazing peepers." Doubled, huh? Would that be 1 in 100 people to 2? 40% to nearly 80%? So I went to the source, an article by [http://www.staff.ncl.ac.uk/daniel.nettle/ernestjonesnettlebateson.pdf Ernest-Jones, Nettle & Bateson (2010)]. As it turns out the sample size was reasonable, and I think effect size in terms of proportions is quite respectable (see figure below), but Wired did make a mistake in reading the graph. The proportion of individuals who ''left litter'' decreased from .4 to .2, thus the proportion of individuals who cleaned up after themselves increased from .6 to .8. In other words, under the watchful eye of the creepy posters cafeteria patrons increased their pro-social behaviour by a factor of 1.3.<br />
<br />
Though these errors are hardly earth shattering, it is an example of how results can easily be misinterpreted and misrepresented in public forums.<br />
<br />
[[File:Smithce.Eyes-On-Coop-Beh Ernest-Jones 2010.png|200px]]<br />
<br />
=== Week 2 ===<br />
A blog written by a well respected author on Business Insider reports on a study on Twitter usage. The business news blogger concluded that Twitter is not a viable marketing tool, because half of Twitter users never read anyone else's Tweets ([http://www.businessinsider.com/twitter-usage-2010-12 "THE TRUTH ABOUT TWITTER: Half Of Twitter Users Never Listen To A Word Anyone Else Says"]).<br />
<br />
However, according to commentary on other sites (such as [http://www.1goodreason.com/blog/blog/2011/01/06/lies-damned-lies-and-statistics/ "Lies Damned Lies and Statistics"]) it would seem that the researchers who conducted the study and the Business Insider who reported on it failed to account for a very important detail, it is estimated that 40%-60% of Twitter accounts are abandoned. So, of courses many people never read anybody's Tweets, because they don't use Twitter!!<br />
<br />
=== Week 3 ===<br />
<br />
[[http://freakonomics.blogs.nytimes.com/2011/01/21/zyzmors-revenge/ Zyzmor’s Revenge?]]<br />
This short article summarizes some cute findings regarding relationships between alphabetical position of surnames. One study showed that researchers with aphabetically early surnames more likely to gain tenure at a top university, become a fellow in the Econometric Society (it was a study by economists), and even win the Nobel Prize. This effect is explained by the fact that in many areas authors are listed alphabetically by last name, thus authors with early surnames are likely to have more citations, since many people (incorrectly) cite papers as Smith et al, 2000. <br />
<br />
On the flip-side, people with late surnames also had to wait longer in lines at school, and as a result are slower (and presumably more thoughtful) at making buying decisions because they weren't rushed like the kids with early last names. I think causality there is pretty thin, and I wonder just how tiny this effect size is!!<br />
<br />
=== Week 4 ===<br />
<br />
CBC reports on a study that video game play is associated with anxiety and depression in youth ([http://www.theeca.com/newsletters/GP/JF/GCLSLFK.pdf Kids' excess video gaming tied to anxiety]). Researchers used latent growth mixture modeling, and concluded that pathological video game play causes depression and anxiety. This counters conventional assumption that youth who are depressed and/or anxious 'retreat' into game play to avoid their feelings. I look forward to learning more about this methodology in the future! The study was published in the January 2011 issue of Pediatrics ([http://www.theeca.com/newsletters/GP/JF/GCLSLFK.pdf Pathological Video Game Use Among Youths: A Two-Year Longitudinal Study]).<br />
<br />
<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
This is an oft referred to problem in Psychology research, but still an interesting one. A large portion of the studies coming out of university Psychology departments exclusively use undergraduate students as participants. Often, students that are required to participate to receive some credit for their courses. How much of what we think we 'know' about psychology might not actually apply to the general population??<br />
<br />
''Comment/Answer'': - A whole lot!! Lets also consider the fact these psychology students are primarily North American and Caucasian! That makes the results of a vast majority of psychological studies even less generalizable! - [[../Constance Mara|Constance]]<br />
<br />
<br />
=== Week 2 ===<br />
As few other members of the course have already commented, it is becoming increasingly clear from the course consulting is quite a bit more involved that it might at first seem. Taking the time to step away from the problem and assess the basic elements of the analysis at hand couldn't be more important. I know I have rushed head first into experiments and/or analysis before really sitting down to get a clear idea of my objectives, and wasted a lot of time in the process!!<br />
<br />
<br />
=== Week 3 ===<br />
I would like to do something in R similar to what Team Rubin showed with the lattice package, but also want the fit summaries for the sub-plots.<br />
<br />
This code splits the continuous X2 variable into 3 'shingles' and plots:<br />
X2group <- equal.count(data$X2,number=3,overlap=0)<br />
xyplot(Y ~ X1 | X2group, data=data)<br />
<br />
But this doesn't work:<br />
fit = lm(Y ~ X1 | X2group, data=data)<br />
summary( fit )<br />
<br />
I would also love to have the data presented in one Y vs X1 plot, with data corresponding to the 3 levels (the categorization of a continuous var) of X2 in different colours.<br />
<br />
Any help?<br />
<br />
Hi Carrie,<br />
<br />
You can try fit<-lm(Y~X1 | X2group, data=data) instead of fit = lm(Y ~ X1 | X2group, data=data) and then try the summary(fit) statement again. Hopefully that works!<br />
<br />
--[[User:Lawarren|Lawarren]] 23:56, 25 January 2011 (EST)<br />
<br />
Thanks for the suggestion, but unfortunately it didn't fix the problem. It doesn't throw an error, but gives 'NA' as the Estimate, Std. Error, t-value and p-value! I had to write some ugly unwieldy code to get what I wanted, and I'm sure there must be a better way...<br />
<br />
<br />
=== Week 4 ===<br />
<br />
In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean? Thanks!<br />
<br />
Andy: The plus sign in logical operation is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]. But in the notes, it is specially defined in summary, Page 88.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Carrie_SmithMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Carrie Smith2011-02-02T15:43:50Z<p>Andytli: </p>
<hr />
<div>==About Me==<br />
This year I began working towards my PhD in Psychology in the Quantitative Methods Area. I completed my Masters degree at York studying depth perception (within the Centre for Vision Research) and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background. I have been attending SCS meetings since September and would like to consult in the coming academic year.<br />
<br />
I have experience working with R, Matlab, SPSS and SAS.<br />
<br />
And a proud member of [[MATH 6627 2010-11 Practicum in Statistical Consulting/Assignment_Teams/Gray|Team Gray]]<br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
The cigarette data discussed in class indicated positive relationship between smoking and life expectancy. Ample evidence exists that smoking causes life expectancy to decrease, so there is likely more going on. Beyond causality, what are some important alternative possibilities that may complicate interpretation of the results of regression from observational data?<br />
(1) There may exist mediating variables<br />
(2) There may exist confounding factors<br />
(3) The sample employed might not a good representation of the population, either due to choice of participants or by using a small sample<br />
(4) A linear regression may not be capturing the true pattern<br />
(5) The dependent variable may in fact be causing the independent variable<br />
(6) Selection bias<br />
<br />
=== Week 2 ===<br />
How, by sketching a few lines on this graph, may we satisfy ourselves that the correlation between education and prestige is significant? Explain why. (ps. thanks for lending me your image [[../Andy Li|Andy]]!<br><br />
[[File:D34.jpg|200px]]<br />
<br />
=== Week 3 ===<br />
A researcher studying a schizophrenia medication in a clinical population discovers that the dosage is positively correlated with strength of symptoms. She is about to begin a recall because the drug appears to be making patients worse, when it occurs to her that perhaps there is another variable in play which restores the good name of her drug. What might that variable be? How could this variable have this effect (sketch!) and would you describe it as a 'confounding' or 'mediating' variable?<br />
<br />
=== Week 4 ===<br />
<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
Researchers from Newcastle University conducted a study in which signs with pictures of staring eyes were posted in a busy little cafeteria to see whether the images would encourage patrons to act responsibly and tidy up after themselves. I came across this study via an article titled [http://www.wired.com/wiredscience/2010/12/eyes-good-behavior/#more-44615 "Fake Watchful Eyes Discourage Naughty Behavior"] on Wired Magazine's website. <br />
<br />
The Wired article reports that, "The number of people who paid attention to the sign, and cleaned up after their meal, doubled when confronted with a pair of gazing peepers." Doubled, huh? Would that be 1 in 100 people to 2? 40% to nearly 80%? So I went to the source, an article by [http://www.staff.ncl.ac.uk/daniel.nettle/ernestjonesnettlebateson.pdf Ernest-Jones, Nettle & Bateson (2010)]. As it turns out the sample size was reasonable, and I think effect size in terms of proportions is quite respectable (see figure below), but Wired did make a mistake in reading the graph. The proportion of individuals who ''left litter'' decreased from .4 to .2, thus the proportion of individuals who cleaned up after themselves increased from .6 to .8. In other words, under the watchful eye of the creepy posters cafeteria patrons increased their pro-social behaviour by a factor of 1.3.<br />
<br />
Though these errors are hardly earth shattering, it is an example of how results can easily be misinterpreted and misrepresented in public forums.<br />
<br />
[[File:Smithce.Eyes-On-Coop-Beh Ernest-Jones 2010.png|200px]]<br />
<br />
=== Week 2 ===<br />
A blog written by a well respected author on Business Insider reports on a study on Twitter usage. The business news blogger concluded that Twitter is not a viable marketing tool, because half of Twitter users never read anyone else's Tweets ([http://www.businessinsider.com/twitter-usage-2010-12 "THE TRUTH ABOUT TWITTER: Half Of Twitter Users Never Listen To A Word Anyone Else Says"]).<br />
<br />
However, according to commentary on other sites (such as [http://www.1goodreason.com/blog/blog/2011/01/06/lies-damned-lies-and-statistics/ "Lies Damned Lies and Statistics"]) it would seem that the researchers who conducted the study and the Business Insider who reported on it failed to account for a very important detail, it is estimated that 40%-60% of Twitter accounts are abandoned. So, of courses many people never read anybody's Tweets, because they don't use Twitter!!<br />
<br />
=== Week 3 ===<br />
<br />
[[http://freakonomics.blogs.nytimes.com/2011/01/21/zyzmors-revenge/ Zyzmor’s Revenge?]]<br />
This short article summarizes some cute findings regarding relationships between alphabetical position of surnames. One study showed that researchers with aphabetically early surnames more likely to gain tenure at a top university, become a fellow in the Econometric Society (it was a study by economists), and even win the Nobel Prize. This effect is explained by the fact that in many areas authors are listed alphabetically by last name, thus authors with early surnames are likely to have more citations, since many people (incorrectly) cite papers as Smith et al, 2000. <br />
<br />
On the flip-side, people with late surnames also had to wait longer in lines at school, and as a result are slower (and presumably more thoughtful) at making buying decisions because they weren't rushed like the kids with early last names. I think causality there is pretty thin, and I wonder just how tiny this effect size is!!<br />
<br />
=== Week 4 ===<br />
<br />
CBC reports on a study that video game play is associated with anxiety and depression in youth ([http://www.theeca.com/newsletters/GP/JF/GCLSLFK.pdf Kids' excess video gaming tied to anxiety]). Researchers used latent growth mixture modeling, and concluded that pathological video game play causes depression and anxiety. This counters conventional assumption that youth who are depressed and/or anxious 'retreat' into game play to avoid their feelings. I look forward to learning more about this methodology in the future! The study was published in the January 2011 issue of Pediatrics ([http://www.theeca.com/newsletters/GP/JF/GCLSLFK.pdf Pathological Video Game Use Among Youths: A Two-Year Longitudinal Study]).<br />
<br />
<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
This is an oft referred to problem in Psychology research, but still an interesting one. A large portion of the studies coming out of university Psychology departments exclusively use undergraduate students as participants. Often, students that are required to participate to receive some credit for their courses. How much of what we think we 'know' about psychology might not actually apply to the general population??<br />
<br />
''Comment/Answer'': - A whole lot!! Lets also consider the fact these psychology students are primarily North American and Caucasian! That makes the results of a vast majority of psychological studies even less generalizable! - [[../Constance Mara|Constance]]<br />
<br />
<br />
=== Week 2 ===<br />
As few other members of the course have already commented, it is becoming increasingly clear from the course consulting is quite a bit more involved that it might at first seem. Taking the time to step away from the problem and assess the basic elements of the analysis at hand couldn't be more important. I know I have rushed head first into experiments and/or analysis before really sitting down to get a clear idea of my objectives, and wasted a lot of time in the process!!<br />
<br />
<br />
=== Week 3 ===<br />
I would like to do something in R similar to what Team Rubin showed with the lattice package, but also want the fit summaries for the sub-plots.<br />
<br />
This code splits the continuous X2 variable into 3 'shingles' and plots:<br />
X2group <- equal.count(data$X2,number=3,overlap=0)<br />
xyplot(Y ~ X1 | X2group, data=data)<br />
<br />
But this doesn't work:<br />
fit = lm(Y ~ X1 | X2group, data=data)<br />
summary( fit )<br />
<br />
I would also love to have the data presented in one Y vs X1 plot, with data corresponding to the 3 levels (the categorization of a continuous var) of X2 in different colours.<br />
<br />
Any help?<br />
<br />
Hi Carrie,<br />
<br />
You can try fit<-lm(Y~X1 | X2group, data=data) instead of fit = lm(Y ~ X1 | X2group, data=data) and then try the summary(fit) statement again. Hopefully that works!<br />
<br />
--[[User:Lawarren|Lawarren]] 23:56, 25 January 2011 (EST)<br />
<br />
Thanks for the suggestion, but unfortunately it didn't fix the problem. It doesn't throw an error, but gives 'NA' as the Estimate, Std. Error, t-value and p-value! I had to write some ugly unwieldy code to get what I wanted, and I'm sure there must be a better way...<br />
<br />
<br />
=== Week 4 ===<br />
<br />
In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean? Thanks!<br />
<br />
Andy: same question from me. But I only knew the plus sign in logical operation. It is called "exclusive or". Wiki link is [http://en.wikipedia.org/wiki/Exclusive_or here]]. But in the notes, it seems like addition.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-01-31T19:50:31Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
=== Week 4 ===<br />
Q: Which is talking about correlation and what is about interaction?<br />
a) people with more years in jail tend to have fewer years of education.<br />
b) The more money I save, the more financially secure I feel. <br />
c) To give a good taste, she squeezed a lemon into her black tea and stired the tea.<br />
d) Helping others in Stats lab is one of our TA job. Another type is grador <br />
e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).<br />
f) The more years of education I complete, the higher my earning potential.<br />
g) Proffessors drink too many coffee, because their work is stressful.<br />
<br />
<br />
Read more: http://www.answers.com/topic/positive-correlation-1#ixzz1Cdz9COBc<br />
<br />
Read more: http://wiki.answers.com/Q/What_is_an_example_of_negative_correlation#ixzz1CdvYoL95<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessable online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the suvery itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-01-31T19:01:53Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessable online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the suvery itself. It only asks just 10 questions and takes about 10 minutes to complete.<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-01-31T18:56:47Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
=== Week 4 ===<br />
[http://2010.census.gov/2010census/#/panel-1 2010 US Census results are now available]<br />
<br />
In Jan. 18, 2011, 2010 US Census data is accessable online. It is easy to reach and will be appreciated by FactFinders<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-26T15:03:03Z<p>Andytli: i</p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG|150px]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG|150px]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG|150px]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|150px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png|150px]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|150px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|150px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|150px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|150px]]<br />
[[File:Gray_p3d_ex1.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png|150px]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png|150px]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png|150px]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png|150px]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?" [http://journals.lww.com/clinicalpain/Abstract/2003/07000/Spousal_Responses_Are_Differentially_Associated.4.aspx Journal Abstract]<br />
<br />
Overly supportive spouses are not necessarily doing their partners a favour.<br />
<br />
''They could be prolonging the recovery of their injured spouses.''<br />
<br />
Men with highly attentive spouses reported higher levels of pain and more disability but did well on physical functions tests.<br />
<br />
Women with highly attentive spouses didn't report feeling more pain or being more disabled. However, they performed more poorly on physical function tests than did women with less attentive spouses<br />
<br />
[[MEDIA: Gray_RoyalPain.pdf]]<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Yes, the article did suggest a causal relationship. One variable is the spousal solicitousness (attentiveness & support). Another is the degree of reported pain and disability. A third is actual physical function. Patient gender was also taken into consideration.<br />
<br />
According to the report, men with chronic pain report more perceived pain and disability when they receive higher levels of spousal attention, and it is implied this is controlling for actual physical ability. For women there was no difference in self-reports of pain and disability with level of spousal support, but women who received more attention from their husbands had poorer physical function than those who did not.<br />
<br />
The data are observational, because there is no manipulation or randomization in collecting the data.<br />
<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. Then he may ask for more LOVEs by reporting hurt. That is a sweet answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as an affectionate payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
Emotional (men) or Physical (women) troubles outside the primary cause of pain. Men who are emotionally unstable may attract and encourage attentiveness in their wives, and also report a disproportionately high level of disability relative to their level of function. Women who are physically weak prior to injury may attract husbands who are highly supportive, but also be more susceptible to injuries that result in chronic pain. <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
This does not appear to be the case.<br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
There is a good case to be made for both forward and backward causality in this study. In addition it would be nice to be able to ontrol for emotional and physical stability prior to chronic injury.<br />
Are frequencies distributed as expected? Are the proportions of highly attentive spouses equal across groups?<br />
ASSESSMENT: Judgement withheld pending further evidence<br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
True. It is more important to accurately measure Coffee Consumption if this is the variable of primary interest in the study. <br />
<br />
Consider this from the analysis of variance framework. When measurement error is introduced, the variability in the outcome accounted for by that factor is decreased, and the error sums of squares increases. If we decrease accuracy in measuring Stress, and the error term will accordingly increase, which will in turn decrease the power to detect the effect of Coffee on Health somewhat, but will not affect the coefficient estimates by much. However, if we have increased measurement error in Coffee Consumption, then the proportion of variability explained by Coffee will decrease AND the error term will increase. We lose power on two fronts!<br />
<br />
Here is a short r-script that demonstrates this idea: [[file: Gray_Q2MeasError.r]]<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
No. The way to do this is starting with 3 variables in the model, and dropping the least "significant", one at a time, until you are left with only "significant" variables.<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
No. After dropping the predictor, model changes. We can see this from [http://www.utdallas.edu/~wiorkow%20/documents/Mod1Lect4Regnew.doc this example @ page 26-28] <br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
No. Size of tumors does provide some info about two treatments, one is the best and another is the worse. But it is not a good idea to estimate the difference between the two treatments by using the difference of size of tumors. It is so biased, because there are other factors which will effect the outcome, such as time in the disease. For example, our target is New York when we drive out of Toronto. But some day, we find everyone around is speaking Spanish. But the answer may be yes too, as Barack Obama's slogan showed: "Yes we can". <br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
<br />
False. An interaction exists if the effect of one independent on the dependent variable varies over another independent variable. This tells us nothing about the relationship between the IVs.<br />
An interaction between two variables can occur whether or not the variables are correlated with one another.<br />
<br />
{| class="wikitable"<br />
|-<br />
! <br />
! No Interaction<br />
! Interaction<br />
|-<br />
| '''No Correlation'''<br />
| [[File:Gray_Q14_nCnI.gif]] [[File:X1X2_Uncorr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_nCI.gif]] [[File:X1X2_Uncorr_Int.png|200px]]<br />
|-<br />
| '''Correlation'''<br />
| [[File:Gray_Q14_CnI.gif]] [[File:X1X2_Corr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_CI.gif]] [[File:X1X2_Corr_Int.png|200px]]<br />
|}<br />
<br />
[[File: Gray_Q14.R]]<br />
<br />
Here is an additional example using binary variables:<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-26T14:27:07Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG|150px]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG|150px]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG|150px]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|150px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png|150px]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|150px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|150px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|150px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|150px]]<br />
[[File:Gray_p3d_ex1.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png|150px]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png|150px]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png|150px]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png|150px]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?" [http://journals.lww.com/clinicalpain/Abstract/2003/07000/Spousal_Responses_Are_Differentially_Associated.4.aspx Journal Abstract]<br />
<br />
Overly supportive spouses are not necessarily doing their partners a favour.<br />
<br />
''They could be prolonging the recovery of their injured spouses.''<br />
<br />
Men with highly attentive spouses reported higher levels of pain and more disability but did well on physical functions tests.<br />
<br />
Women with highly attentive spouses didn't report feeling more pain or being more disabled. However, they performed more poorly on physical function tests than did women with less attentive spouses<br />
<br />
[[MEDIA: Gray_RoyalPain.pdf]]<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Yes, the article did suggest a causal relationship. One variable is the spousal solicitousness (attentiveness & support). Another is the degree of reported pain and disability. A third is actual physical function. Patient gender was also taken into consideration.<br />
<br />
According to the report, men with chronic pain report more perceived pain and disability when they receive higher levels of spousal attention, and it is implied this is controlling for actual physical ability. For women there was no difference in self-reports of pain and disability with level of spousal support, but women who received more attention from their husbands had poorer physical function than those who did not.<br />
<br />
The data are observational, because there is no manipulation or randomization in collecting the data.<br />
<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. Then he may ask for more LOVEs by reporting hurt. That is a sweet answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as an affectionate payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
Emotional (men) or Physical (women) troubles outside the primary cause of pain. Men who are emotionally unstable may attract and encourage attentiveness in their wives, and also report a disproportionately high level of disability relative to their level of function. Women who are physically weak prior to injury may attract husbands who are highly supportive, but also be more susceptible to injuries that result in chronic pain. <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
This does not appear to be the case.<br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
There is a good case to be made for both forward and backward causality in this study. In addition it would be nice to be able to ontrol for emotional and physical stability prior to chronic injury.<br />
Are frequencies distributed as expected? Are the proportions of highly attentive spouses equal across groups?<br />
ASSESSMENT: Judgement withheld pending further evidence<br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
True. It is more important to accurately measure Coffee Consumption if this is the variable of primary interest in the study. <br />
<br />
Consider this from the analysis of variance framework. When measurement error is introduced, the variability in the outcome accounted for by that factor is decreased, and the error sums of squares increases. If we decrease accuracy in measuring Stress, and the error term will accordingly increase, which will in turn decrease the power to detect the effect of Coffee on Health somewhat, but will not affect the coefficient estimates by much. However, if we have increased measurement error in Coffee Consumption, then the proportion of variability explained by Coffee will decrease AND the error term will increase. We lose power on two fronts!<br />
<br />
Here is a short r-script that demonstrates this idea: [[file: Gray_Q2MeasError.r]]<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
No. The way to do this is starting with 3 variables in the model, and dropping the least "significant", one at a time, until you are left with only "significant" variables.<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
No. After dropping the predictor, model changes. We can see this from [http://www.utdallas.edu/~wiorkow%20/documents/Mod1Lect4Regnew.doc this example @ page 26-28] <br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
<br />
False. An interaction exists if the effect of one independent on the dependent variable varies over another independent variable. This tells us nothing about the relationship between the IVs.<br />
An interaction between two variables can occur whether or not the variables are correlated with one another.<br />
<br />
{| class="wikitable"<br />
|-<br />
! <br />
! No Interaction<br />
! Interaction<br />
|-<br />
| '''No Correlation'''<br />
| [[File:Gray_Q14_nCnI.gif]] [[File:X1X2_Uncorr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_nCI.gif]] [[File:X1X2_Uncorr_Int.png|200px]]<br />
|-<br />
| '''Correlation'''<br />
| [[File:Gray_Q14_CnI.gif]] [[File:X1X2_Corr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_CI.gif]] [[File:X1X2_Corr_Int.png|200px]]<br />
|}<br />
<br />
[[File: Gray_Q14.R]]<br />
<br />
Here is an additional example using binary variables:<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-26T14:13:36Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG|150px]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG|150px]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG|150px]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|150px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png|150px]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|150px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|150px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|150px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|150px]]<br />
[[File:Gray_p3d_ex1.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png|150px]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png|150px]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png|150px]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png|150px]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?" [http://journals.lww.com/clinicalpain/Abstract/2003/07000/Spousal_Responses_Are_Differentially_Associated.4.aspx Journal Abstract]<br />
<br />
Overly supportive spouses are not necessarily doing their partners a favour.<br />
<br />
''They could be prolonging the recovery of their injured spouses.''<br />
<br />
Men with highly attentive spouses reported higher levels of pain and more disability but did well on physical functions tests.<br />
<br />
Women with highly attentive spouses didn't report feeling more pain or being more disabled. However, they performed more poorly on physical function tests than did women with less attentive spouses<br />
<br />
[[MEDIA: Gray_RoyalPain.pdf]]<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Yes, the article did suggest a causal relationship. One variable is the spousal solicitousness (attentiveness & support). Another is the degree of reported pain and disability. A third is actual physical function. Patient gender was also taken into consideration.<br />
<br />
According to the report, men with chronic pain report more perceived pain and disability when they receive higher levels of spousal attention, and it is implied this is controlling for actual physical ability. For women there was no difference in self-reports of pain and disability with level of spousal support, but women who received more attention from their husbands had poorer physical function than those who did not.<br />
<br />
The data are observational, because there is no manipulation or randomization in collecting the data.<br />
<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. Then he may ask for more LOVEs by reporting hurt. That is a sweet answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as an affectionate payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
Emotional (men) or Physical (women) troubles outside the primary cause of pain. Men who are emotionally unstable may attract and encourage attentiveness in their wives, and also report a disproportionately high level of disability relative to their level of function. Women who are physically weak prior to injury may attract husbands who are highly supportive, but also be more susceptible to injuries that result in chronic pain. <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
This does not appear to be the case.<br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
There is a good case to be made for both forward and backward causality in this study. In addition it would be nice to be able to ontrol for emotional and physical stability prior to chronic injury.<br />
Are frequencies distributed as expected? Are the proportions of highly attentive spouses equal across groups?<br />
ASSESSMENT: Judgement withheld pending further evidence<br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
True. It is more important to accurately measure Coffee Consumption if this is the variable of primary interest in the study. <br />
<br />
Consider this from the analysis of variance framework. When measurement error is introduced, the variability in the outcome accounted for by that factor is decreased, and the error sums of squares increases. If we decrease accuracy in measuring Stress, and the error term will accordingly increase, which will in turn decrease the power to detect the effect of Coffee on Health somewhat, but will not affect the coefficient estimates by much. However, if we have increased measurement error in Coffee Consumption, then the proportion of variability explained by Coffee will decrease AND the error term will increase. We lose power on two fronts!<br />
<br />
Here is a short r-script that demonstrates this idea: [[file: Gray_Q2MeasError.r]]<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
No. After dropping the predictor, model changes. We can see this from [http://www.utdallas.edu/~wiorkow%20/documents/Mod1Lect4Regnew.doc this example start page 26-28] <br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
<br />
False. An interaction exists if the effect of one independent on the dependent variable varies over another independent variable. This tells us nothing about the relationship between the IVs.<br />
An interaction between two variables can occur whether or not the variables are correlated with one another.<br />
<br />
{| class="wikitable"<br />
|-<br />
! <br />
! No Interaction<br />
! Interaction<br />
|-<br />
| '''No Correlation'''<br />
| [[File:Gray_Q14_nCnI.gif]] [[File:X1X2_Uncorr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_nCI.gif]] [[File:X1X2_Uncorr_Int.png|200px]]<br />
|-<br />
| '''Correlation'''<br />
| [[File:Gray_Q14_CnI.gif]] [[File:X1X2_Corr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_CI.gif]] [[File:X1X2_Corr_Int.png|200px]]<br />
|}<br />
<br />
[[File: Gray_Q14.R]]<br />
<br />
Here is an additional example using binary variables:<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-26T14:11:52Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG|150px]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG|150px]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG|150px]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|150px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png|150px]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|150px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|150px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|150px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|150px]]<br />
[[File:Gray_p3d_ex1.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|150px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png|150px]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png|150px]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png|150px]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png|150px]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?" [http://journals.lww.com/clinicalpain/Abstract/2003/07000/Spousal_Responses_Are_Differentially_Associated.4.aspx Journal Abstract]<br />
<br />
Overly supportive spouses are not necessarily doing their partners a favour.<br />
<br />
''They could be prolonging the recovery of their injured spouses.''<br />
<br />
Men with highly attentive spouses reported higher levels of pain and more disability but did well on physical functions tests.<br />
<br />
Women with highly attentive spouses didn't report feeling more pain or being more disabled. However, they performed more poorly on physical function tests than did women with less attentive spouses<br />
<br />
[[MEDIA: Gray_RoyalPain.pdf]]<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Yes, the article did suggest a causal relationship. One variable is the spousal solicitousness (attentiveness & support). Another is the degree of reported pain and disability. A third is actual physical function. Patient gender was also taken into consideration.<br />
<br />
According to the report, men with chronic pain report more perceived pain and disability when they receive higher levels of spousal attention, and it is implied this is controlling for actual physical ability. For women there was no difference in self-reports of pain and disability with level of spousal support, but women who received more attention from their husbands had poorer physical function than those who did not.<br />
<br />
The data are observational, because there is no manipulation or randomization in collecting the data.<br />
<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. Then he may ask for more LOVEs by reporting hurt. That is a sweet answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as an affectionate payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
Emotional (men) or Physical (women) troubles outside the primary cause of pain. Men who are emotionally unstable may attract and encourage attentiveness in their wives, and also report a disproportionately high level of disability relative to their level of function. Women who are physically weak prior to injury may attract husbands who are highly supportive, but also be more susceptible to injuries that result in chronic pain. <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
This does not appear to be the case.<br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
There is a good case to be made for both forward and backward causality in this study. In addition it would be nice to be able to ontrol for emotional and physical stability prior to chronic injury.<br />
Are frequencies distributed as expected? Are the proportions of highly attentive spouses equal across groups?<br />
ASSESSMENT: Judgement withheld pending further evidence<br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
True. It is more important to accurately measure Coffee Consumption if this is the variable of primary interest in the study. <br />
<br />
Consider this from the analysis of variance framework. When measurement error is introduced, the variability in the outcome accounted for by that factor is decreased, and the error sums of squares increases. If we decrease accuracy in measuring Stress, and the error term will accordingly increase, which will in turn decrease the power to detect the effect of Coffee on Health somewhat, but will not affect the coefficient estimates by much. However, if we have increased measurement error in Coffee Consumption, then the proportion of variability explained by Coffee will decrease AND the error term will increase. We lose power on two fronts!<br />
<br />
Here is a short r-script that demonstrates this idea: [[file: Gray_Q2MeasError.r]]<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
No. After dropping the predictor, model changes. We can see this from [http://www.utdallas.edu/~wiorkow%20/documents/Mod1Lect4Regnew.doc this example start from page 26] <br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
<br />
False. An interaction exists if the effect of one independent on the dependent variable varies over another independent variable. This tells us nothing about the relationship between the IVs.<br />
An interaction between two variables can occur whether or not the variables are correlated with one another.<br />
<br />
{| class="wikitable"<br />
|-<br />
! <br />
! No Interaction<br />
! Interaction<br />
|-<br />
| '''No Correlation'''<br />
| [[File:Gray_Q14_nCnI.gif]] [[File:X1X2_Uncorr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_nCI.gif]] [[File:X1X2_Uncorr_Int.png|200px]]<br />
|-<br />
| '''Correlation'''<br />
| [[File:Gray_Q14_CnI.gif]] [[File:X1X2_Corr_NoInt.png|200px]] <br />
| [[File:Gray_Q14_CI.gif]] [[File:X1X2_Corr_Int.png|200px]]<br />
|}<br />
<br />
[[File: Gray_Q14.R]]<br />
<br />
Here is an additional example using binary variables:<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T03:48:06Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Andy: Yes, it did suggest. One variable is the spousal solicitousness. Another is the degree of pain. The data are observational, because there is no human intervention when collecting the data.<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. Then he may ask for more LOVEs by reporting hurt. That is a sweet answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as an affectionate payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
Andy: No. There is an example to say the relationship of these two.<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T03:44:11Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Andy: Yes, it did suggest. One variable is the spousal solicitousness. Another is the degree of pain. The data are observational, because there is no human intervention when collecting the data.<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. Then he may ask for more LOVEs by reporting hurt. That is a sweat answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as a sweat payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
Andy: No. There is an example to say the relationship of these two.<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T03:38:18Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Andy: Yes, it did suggest. One variable is the spousal solicitousness. Another is the degree of pain. The data are observational, because there is no human intervention when collecting the data.<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. That is a sweat answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as a sweat payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:<br />
Andy: No. There is an example to say the relationship of these two.<br />
[http://cnx.org/content/m31440/latest/ Illustration of the difference between correlation and interaction amongst independent variables]</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T03:26:52Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Andy: Yes, it did suggest. One variable is the spousal solicitousness. Another is the degree of pain. The data are observational, because there is no human intervention when collecting the data.<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. That is a sweat answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as a sweat payback. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T03:24:44Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Andy: Yes, it did suggest. One variable is the spousal solicitousness. Another is the degree of pain. The data are observational, because there is no human intervention when collecting the data.<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. That is a sweat answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through solicitousness. Her heart is numbed with those LOVEs. And then she will report she is better as a return. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T03:22:02Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Andy: Yes, it did suggest. One variable is the spousal solicitousness. Another is the degree of pain. The data are observational, because there is no human intervention when collecting the data.<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. That is a sweat answer to any woman, but tooth-hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. And with high solicitousness, man's LOVE starts to fall in. He may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. But to a woman, she likes to finding pieces of LOVEs through some solicitousness. Her heart is numbed with those LOVE. And then she reports she is better. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T03:08:34Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Andy: Yes, it did suggest. One variable is the spousal solicitousness. Another is the degree of pain. The data are observational, because there is no human intervention when collecting the data.<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
Andy: Yes, maybe high degree of pain causes solicitousness, instead of solicitousness causing towards more pain. I don't think there exits a confounding factor. If there is, I believe it is LOVE, because a man's LOVE wins a highly attentive spouse and doesn't work as a narcotic as a woman feels. And with that kind of LOVE, he may move his body, more likely, which may cause more hurt. And then his physical function may recover faster. That is a sweat answer to any woman, but tooth- hurting felt by any healthy man.<br />
<br />
Also, LOVE could be the mediating factor too. <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T02:37:59Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
Andy: Yes, it did suggest. One variable is the spousal solicitousness. Another is the degree of pain. The data are observational, because there is no human intervention when collecting the data.<br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T02:15:19Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: whether the article suggest a causal relationship between two variables? If so which? Are the data observational or experimental?<br />
<br />
;Discussion: <br />
<br />
====2====<br />
;Question: Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors? <br />
<br />
;Discussion: <br />
<br />
====3====<br />
;Question: Have any confounding factors been accounted for in the analysis? <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship? <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: What is your personal assessment of the evidence for causality in the study that is the subject of the article? <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-01-24T01:59:41Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-01-24T01:55:43Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old common sense. This a report, provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")<br />
<br />
=== Week 3 ===<br />
It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-01-24T01:30:43Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old common sense. This a report, provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-01-24T01:29:14Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] [[beta]]defines the Y-intercept of the plane, defines the slope of the plane along the x1 axis, and defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old common sense. This a report, provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")</div>Andytlihttp://scs.math.yorku.ca/index.php/File:G2.pngFile:G2.png2011-01-24T01:27:17Z<p>Andytli: </p>
<hr />
<div></div>Andytlihttp://scs.math.yorku.ca/index.php/File:G1.pngFile:G1.png2011-01-24T01:26:50Z<p>Andytli: </p>
<hr />
<div></div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Andy_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li2011-01-24T01:26:32Z<p>Andytli: </p>
<hr />
<div>====Welcome to my wiki page for the Statistical Consulting Practicum====<br />
<br />
==About Me==<br />
My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days. <br />
<br />
==Sample Exam Questions==<br />
=== Week 1 ===<br />
(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?<br />
<br />
Answer: 5C2*6C4=150<br />
<br />
=== Week 2 ===<br />
[[File:d34.jpg]]<br />
<br />
(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?<br />
<br />
Answer: Please see the lecture notes of 2nd lecture. <br />
<br />
=== Week 3 ===<br />
Q: Explain the multiple regression Coefficients in three dimensions.<br />
<br />
A: [[File:g1.png]] defines the Y-intercept of the plane, defines the slope of the plane along the x1 axis, and defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth. <br />
<br />
[[File:g2.png]]<br />
<br />
==Statistics in the Media==<br />
=== Week 1 ===<br />
[http://endoftheamericandream.com/archives/50-statistics-about-the-u-s-economy-that-are-almost-too-crazy-to-believe 50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe]<br />
<br />
Can you see 2010 is crisis year? It's NOT the time to change! <br />
[http://moneywatch.bnet.com/economic-news/blog/financial-decoder/18-scary-us-debt-facts/2824/ 18 Scary US Debt Facts] <br />
<br />
If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.<br />
<br />
=== Week 2 ===<br />
[http://timesofindia.indiatimes.com/city/varanasi/4-day-conference-begins-at-BHU/articleshow/7224523.cms 4-day conference begins at BHU] Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.<br />
<br />
=== Week 3 ===<br />
[http://www.newswire.ca/en/releases/archive/January2011/19/c3506.html Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail]<br />
<br />
This is a current news with a very old common sense. This a report, provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute. This is "Putting Transportation on Track" , with nothing new.<br />
<br />
==Questions and Comments on Groupwork and Class Lectures==<br />
=== Week 1 ===<br />
For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive. <br />
<br />
=== Week 2 ===<br />
Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:<br />
<br />
1. uninstall your current R.<br />
<br />
2. go to [http://cran.r-project.org/bin/windows/base/ R download] to download a new R.<br />
<br />
3. do a short installation of R. <br />
<br />
4. run <br />
<br />
> install.packages( c("car","rgl"))<br />
<br />
> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")</div>Andytlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/GrayMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Gray2011-01-24T00:16:22Z<p>Andytli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Mary_W._Gray<br />
<br />
==Assignment 1== <br />
===1. Simpson's Paradox=== <br />
<br />
Example: My friend and I play a basketball game and each shoot 20 shots. Who is the better shooter?<br />
<br />
[[File:h65.PNG]]<br />
<br />
But, who is the better shooter if you control for the distance of the shot? Who would you rather have on your team?<br />
<br />
[[File:h32.PNG]]<br />
This is question of Simpson's Paradox. <br />
<br />
[[File:h45.PNG]] <br />
<br />
We can see from this figure, the relationship changed from negative to positive when we took the distance to our consideration. Black line linked probability of we two made. Red line is linked our performance when far, but blue when close.<br />
<br />
Simpson’s paradox arises from one simple mathematical truth. Given eight real numbers: a, b, c, d, A, B, C, D with the following properties:[[File:12.png]], then it is not necessarily true that[[File:122.png]]. In fact, it may be true that:[[File:13.png]].<br />
<br />
This is an obvious math reality, yet it has significant ramifications in Bayesian analysis, medical research, science and engineering studies, and societal statistical analysis. It is of concern for any statistical activity involving the calculation and analysis of ratios of two measurements.<br />
<br />
Exmaple 2 (Real Income tax example)<br />
<br />
[[File:simpson1.pdf]]<br />
<br />
===2. Graphics to visualize data === <br />
Gray: rgl and p3d (include the use of the 'groups' parameter to produce trajectories<br />
<br />
===Introduction===<br />
<br />
[[File:Grey_rgl_p3d_1.pdf]]<br />
[[File:Grey_rgl_p3d_2.pdf]]<br />
<br />
''rgl'' is a library of functions that offers 3D real-time visualization functionality to the R programming environment (Adler & Murdoch, 2010), providing OpenGL implemention for R.<br />
<br />
''p3d'' is a library of functions which employs functions from RGL to help visualize statistical models expressed as a function of 2 independent variables with the possible addition of a categorical variable (Monette, 2009).<br />
<br />
===Package ''rgl''===<br />
<br />
With ''rgl'' we create a ‘device’ , which is simply a window, within which a ‘world’ is created where we can create 3 dimensional shapes and through which we can navigate.<br />
<br />
[[File:Grey_World.png|200px]]<br />
<br />
Functions within the rgl package can be divided into 6 categories:<br />
(1) Device management functions (open and close devices, control active device) <br />
(2) Scene management functions (option to remove certain or all objects from the scene)<br />
(3) Export functions (creating image files)<br />
(4) Shape functions - essential plotting tools primitives (points, lines, triangles, quads) as well as higher level functions (text, spheres, surfaces).<br />
<br />
[[File:Grey-Shapes.png]]<br />
<br />
(5) Environment functions - modify the viewpoint, background and bounding box, adding light sources<br />
(6) Appearance function rgl.material(…).<br />
<br />
[[File:Grey_AppearanceOptions.png|300px]]<br />
<br />
Using shapes and surfaces within an ''rgl'' device, statistical data can be represented in 3 dimensions. Some advanced examples are available as demos or provided on the [http://rgl.neoscientists.org/docs.shtml rgl website].<br />
<br />
[[File:Gray_Rgl_3d_histogram.png|200px]]<br />
[[File:Grey_rgl_example_imulated_animal_abundance.png|200px]]<br />
<br />
A few of the functions from ''rgl'' are useful for manipulating 3D models created using ''p3d'', since ''p3d'' contains many functions that inherit from ''rgl'' but taylor them to statistical methods. Thus all but a few are unnecessary for our purposes unless you would like to contribute functionality to ''p3d''!<br />
<br />
<br />
<br />
----<br />
<br />
===Package p3d===<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the tuition.Rdata (source:) and USIndicesIndustrialProd.Rdata (source:) data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
In this section I will focus on example code you may use to familiarize yourself with the capabilities of this package. You will require the [[File:Gray_p3d_ex_Tuition.txt]] and [[File:Gray_p3d_ex_USIndicesIndustrialProd.txt]] data sets. Note that a few of the commands employed in sample code are from rgl, but these will likely be superceded by ''p3d'' functions as the package matures.<br />
<br />
Initialization code:<br />
----<br />
library( lattice )<br><br />
library( nlme )<br><br />
library( car )<br><br />
library( spida )<br><br />
library( rgl )<br><br />
library( p3d )<br><br />
tuit = read.table('tuition.Rdata',header=TRUE)<br><br />
head(tuit)<br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
----<br />
<br />
<br />
For the tuition data we will begin by plotting the annual cost of tuition from a sample of American Universities against the rates of faculty compensation and proportion of students who graduate.<br />
<br />
Using mouse keys you can change the field of view and zoom in and out. ''Plot3d'' creates the 3D plot as shown on the right.<br />
<br />
We can remove elements from the device using the function ''Pop3d()''. This function removes elements starting with the most recently added item. Multiple items can be removed addition an numeric argument, ie.''Pop3d(4)''<br><br />
[[File:Gray_rgl_navigation.png|200px]]<br />
[[File:Gray_p3d_ex1.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d(tuition ~ fac_comp + graduat, col = c("blue"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will subdivide the data by category, in this case whether the school is private (red) or public (blue) (variable name public.private).<br><br />
[[File:Gray_p3d_ex2.png|200px]]<br />
----<br />
Init3d(cex = .8)<br><br />
Plot3d( tuition ~ fac_comp + graduat|public.private, col = c("blue", "red"), data = tuit)<br><br />
----<br />
<br />
<br />
Next we will add regression planes for private(red) and public(blue) schools using the lm() function to determine the fit, and Fit3d() to insert the plane in the graph. Axes and labels are added using Axes3d() and title3d().<br><br />
[[File:Gray_p3d_ex3.png]]<br />
----<br />
fitpub = lm(tuition ~ fac_comp + graduat,subset=(public.private==0),data = tuit)<br><br />
Fit3d( fitpub, col = c("blue"))<br><br />
<br><br />
fitpri = lm(tuition ~ fac_comp + graduat,subset=(public.private==1),data = tuit)<br><br />
Fit3d( fitpri, col = c("red"))<br><br />
<br><br />
Axes3d()<br><br />
title3d(main='Tuition predicted by grad rates and faculty salary -private (red) and public(blue) institutions')<br><br />
----<br />
<br />
<br />
Data ellipses are useful for understanding our data.<br><br />
[[File:Gray_p3d_ex4.png]]<br />
----<br />
Ell3d()<br />
----<br />
<br />
<br />
We can change the view point of our graph using function ''view3d(theta,phi,fov,zoom)'', which takes polar coordinates. Note that ''view3d(0,0,0)'' will rotate the image to to face the x-z plane (y into the screen) and ''view3d(270,0,0)'' will rotate the image to to face the y-z plane (x into the screen). Function ''snap()'' will capture a still image of the current view. Note that to use ''movie3d()'' you must have ImageMagick installed to automatically convert png's to gif, otherwise you must use external software.<br />
<br />
[[File:Gray_p3d_ex5.png]]<br />
[[File:GrayMovie.gif]]<br />
----<br />
view3d(0,0,0)<br><br />
snap()<br><br />
<br />
spin(theta = 0, phi = 0)<br><br />
<br />
spins(inc.theta = 1/4, inc.phi = 0, theta = NULL, phi = NULL)<br><br />
<br />
movie3d( spin3d(axis=c(0,1,0), rpm=20), duration=2, dir='movie' )<br><br />
----<br />
<br />
<br />
Here is an additional example, using data on the US indices of industrial products, plotting Mining production (MIN) over months and years. Adding the argument ‘groups=YR’ to ''Plot3d'' connects the months in a given year to produce trajectories.<br />
[[File:Gray_p3d_ex6.png]]<br />
----<br />
open3d(windowRect=c(100,100,800,800),cex = .8) <br><br />
prod = read.table('USIndicesIndustrialProd.Rdata',header=TRUE)<br><br />
head(prod)<br><br />
Plot3d(MIN ~ YR+MONTH,data=prod,groups=YR)<br><br />
Axes3d()<br><br />
title3d(main='Industrial Production Mining (1947-1993)')<br><br />
view3d(215,0,45)<br><br />
----<br />
<br />
<br />
==Assignment 2==<br />
<br />
Statistics in the News: "Spousal support a royal pain?"<br />
<br />
====1====<br />
;Question: <br />
<br />
;Discussion: <br />
<br />
====2====<br />
;Question: <br />
<br />
;Discussion: <br />
<br />
====3====<br />
;Question: <br />
<br />
;Discussion: <br />
<br />
====4====<br />
;Question: <br />
<br />
;Discussion: <br />
<br />
====5====<br />
;Question: <br />
<br />
;Discussion: <br />
<br />
<br />
=== Paradoxes and Fallacies ===<br />
==== 2. ==== <br />
;Question:You are studying observational data on the relationship between Health and Coffee (measured in grams of caffeine consumed per day). Suppose you want to control for a possible confounding factor 'Stress'. In this kind of study it is more important to make sure that you measure coffee consumption accurately than it is to make sure that you measure 'stress' accurately. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 5.==== <br />
;Question: In a multiple regression of Y on three predictors, X1, X2 and X3, if the coefficients of both X2 and X3, are not significant, it is safe to drop these two variable and perform a regression on X1 alone. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 8.====<br />
;Question: In a multiple regression, if you drop a predictor whose effect is not significant, the p-values of the other predictors should not change very much. <br />
<br />
;Discussion:<br />
<br />
<br />
==== 11.==== <br />
;Question: In a model to assess the effect of a number of treatments on some outcome, we can estimate the difference between the best treatment and the worse treatment by using the difference in the mean outcomes. <br />
<br />
;Discussion: <br />
<br />
<br />
==== 14.==== <br />
;Question: If two variables have a strong interaction, this implies a strong correlation. <br />
<br />
;Discussion:</div>Andytli