MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Andy Li

From Wiki1

(Difference between revisions)
Jump to: navigation, search
Line 85: Line 85:
A: T T F T
A: T T F T
 +
 +
=== Week 10 ===
 +
Q: Why we say HLM is superior to OLS?
 +
 +
A: It is because HLM theoretically produces appropriate error terms that control for potential dependency due to nesting effects, while OLS does not.
 +
 +
Additional argument favoring the use of HLM is that it is a generalization of OLS, which better handles continuous variables that reflect randomized effect designs, and, therefore, HLM produces more accurate error terms and Type I error rates.
 +
 +
A good part of the cited advantages for HLM is related to the situations in which the intraclass correlations, which is the between group effect divided by the total effect. If the correlation is zero, there seems to be less advantage to using HLM because there is no interclass correlation.
==Statistics in the Media==
==Statistics in the Media==
Line 123: Line 132:
=== Week 9 ===
=== Week 9 ===
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]
Really good articles about the effects of the Japan crisis: [http://lakshmi-capital.com/2011/03/japanese-crisis-analysis-part-1-currencies-and-the-return-of-panic/ Japanese Crisis Analysis Part 1: Currencies and the Return of Panic]and [http://www.benzinga.com/11/03/936488/japanese-crisis-analysis-part-2-us-stocks-and-the-return-of-panic/ Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic]
 +
 +
=== Week 10 ===
 +
Simpson's Paradox in sports[http://sportsillustrated.cnn.com/2011/writers/scorecasting/03/24/simpson-paradox/ Explaining the Simpson Paradox]
==Questions and Comments on Groupwork and Class Lectures==
==Questions and Comments on Groupwork and Class Lectures==
Line 169: Line 181:
=== Week 9 ===
=== Week 9 ===
What is the difference of panel data, longitudinal data,time series data and cross-sectional data?
What is the difference of panel data, longitudinal data,time series data and cross-sectional data?
 +
 +
=== Week 10 ===
 +
What is the difference of dropout, missing data and censored data? What is informative dropout?

Revision as of 22:07, 28 March 2011

Contents

Welcome to my wiki page for the Statistical Consulting Practicum

About Me

My name is Andy Li. From 2008, I started undergraduate in Statistics in York. After two years study, I am now a master student in Applied Statistics. I have another bachelor degree in Chemical Engineering. I have worked as a B2B salesman for many years. I used SAS, R, matlab, minitab and Maple in the classes. I am very happy to join our 6627 team to know more about Statistics. Hope we can build up not only our knowledge but also our friendship in these following days.

Sample Exam Questions

Week 1

(In class) There are 11 students in the 6627 class, 5 of which are male. There is a statistics meeting in the campus. All the student want to attend as visitors, but the problem is there are only 6 seats left. So, professor needs to choose 6 students randomly and wants two of them to be male, to help setting up the tables. How many ways can he select students for his experiment?

Answer: 5C2*6C4=150

Week 2

D34.jpg

(In class)We have the cloud of our data, data ellipse and the centre point. We also know about 40% data falling into the inside ellipse and 86% data in the outside ellipse. What is the radius of the outside ellipse? What is Sx,Sy,Sy.x and 95% confidence interval?

Answer: Please see the lecture notes of 2nd lecture.

Week 3

Q: Explain the multiple regression Coefficients in three dimensions.

A: G1.png Beta0 defines the Y-intercept of the plane. Beta1 defines the slope of the plane along the x1 axis, and beta2 defines the slope of the plane along the x2 axis. The error term ε adds noise to the plane so that it is bumpy instead of being perfectly smooth.

G2.png

Week 4

Q: Which is talking about correlation and what is about interaction?

a) people with more years in jail tend to have fewer years of education.

b) The more money I save, the more financially secure I feel.

c) To give a good taste, she squeezed a lemon into her black tea and stirred the tea.

d) Helping others in Stats lab is one of our TA job. Another type is grader.

e) Steel has the ability to become harder and stronger through adding carbon to it and quenching (rapidly cooling).

f) The more years of education I complete, the higher my earning potential.

g) Professors drink too many coffee, because their work is stressful.

A: Correlation: a b f g  ; interaction: c e

Week 5

Q: We have 5 point at (1,1)(1,2)(2,1)(2,2)and(2,3),what does the linear regression line look like?

A: Go to this web page and try it. Add more point to get the idea of the rule.

Week 6

Q: Filling the blankets: MANOVA is the abbreviation of ______(1)______. where sums of squares appear in univariate analysis of variance, in MANOVA certain ________(2)________appear, but the sums of squares appear at ____(3)_____ entries.

A: (1)Multivariate analysis of variance (2)positive-definite matrices (3)diagonal

Week 7

Q: What are the assumptions of MANOVA?

A: 1. Independent Random Sampling: MANOVA assumes that the observations are independent of one another, there is not any pattern for the selection of the sample, the sample is completely random.

2. Level and Measurement of the Variables: MANOVA assumes that the independent variables are categorical and the dependent variables are continuous or scale variables.

3. Linearity of dependent variable: The dependent variables can be correlated to each other, or may be independent of each other. Study shows that a moderately correlated dependent variable is preferred, if the dependent variables are independent of each other, then we have to sacrifice the degrees of freedom and it will decrease the power of the analysis.

4. Multivariate Normality: Multivariate normality is present in the data.

5. Multivariate Homogeneity of Variance: Variance between groups is equal.

Week 8

Q: Given a set of complex data in a mixed model, please suggest a way to select all the observations from a sample of clusters.

A: Actually, there are many ways to do. In the example, Prof gave one as following: We first create the school summary file and take a sample of school from that file. We then merge the sample file with the longfile. Merge will match with variables that have the same name. By default, it only uses records that match in both files, so it produces the result we want.

Week 9

Q: True or False for the statements below:

a)Autocorrelation could be negative.

b)Strong positive autocorrelation can be a symptom of lack of fit.

c)Occasional large measurement errors will contribute positively to the estimate of autocorrelation.

d)Autocorrelation of a random process describes the correlation between values of the process at different points in time.

A: T T F T

Week 10

Q: Why we say HLM is superior to OLS?

A: It is because HLM theoretically produces appropriate error terms that control for potential dependency due to nesting effects, while OLS does not.

Additional argument favoring the use of HLM is that it is a generalization of OLS, which better handles continuous variables that reflect randomized effect designs, and, therefore, HLM produces more accurate error terms and Type I error rates.

A good part of the cited advantages for HLM is related to the situations in which the intraclass correlations, which is the between group effect divided by the total effect. If the correlation is zero, there seems to be less advantage to using HLM because there is no interclass correlation.

Statistics in the Media

Week 1

50 Statistics About The U.S. Economy That Are Almost Too Crazy To Believe

Can you see 2010 is crisis year? It's NOT the time to change! 18 Scary US Debt Facts

If there is no statistics, the people over the border may toast to celebrate the big success of this year this time.

Week 2

4-day conference begins at BHU Banaras Hindu University (BHU), Hindi: काशी हिन्दू विश्वविद्यालय, is a premier central university and a world class educational institution located in Varanasi, India. It is regarded as the largest residential university in Asia.

Week 3

Greater Toronto & Hamilton Area Transit Choices: Report finds much higher transportation emissions from road vs. rail

This is a current news with a very old topic. This is provided by two of Canada's leading think tanks: Sustainable Prosperity and Pembina Institute, talking about a well known common sense. This is "Putting Transportation on Track" , with nothing new.

Week 4

2010 US Census results are now available

In Jan. 18, 2011, 2010 US Census data is accessible online. It is easy to reach and will be appreciated by many fact finders. The other thing in the high light is the survey itself. It only asks just 10 questions and takes about 10 minutes to complete.

Week 5

So now, everyone knows Statistics is not useless and Statisticians are a group of supermen. Toronto man cracked the code to scratch-lottery tickets

Week 6

City growing quickly - Saskatoon outpaces all metro areas in Canada You can choose to find a job there and they have a well known immigration program for international students.

Week 7

Crime stats in Canada: Brian Lee Crowley: Crime stats change based on who counts them


Week 8

Let pray for Japan. No nuclear catastrophe again. I visited there in 2000. That was a really beautiful country. I hope I can do some for the victims. I don't know what this is about in the news, but wish it helpful. Data management system aids Japan relief effort

Week 9

Really good articles about the effects of the Japan crisis: Japanese Crisis Analysis Part 1: Currencies and the Return of Panicand Japanese Crisis Analysis Part 2: US Stocks and the Return of Panic

Week 10

Simpson's Paradox in sportsExplaining the Simpson Paradox

Questions and Comments on Groupwork and Class Lectures

Week 1

For the example about smoking and life expectation we discussed in class, I think there are two more factors we should take into consideration. One is the smoke starting age. I know kids smoke in some poor countries, because they don't know it is harmful. Another is the different type of cigarette has different effect. Although whatever the type of cigarette that you smoke, it will always be ‘injurious to health’. But the cigarettes in rich countries have relatively less addictive nicotine and tar, also are equipped with long filters and not allowed to add harmful favor additives. I know this because I was a heavy smoker before, and had smoked more than 100 brands of cigars, which come from different countries. But yes, I am still alive.

Week 2

Some classmates couldn't install the package of spida and p3d. Below is the procedure you should follow:

1. uninstall your current R.

2. go to R download to download a new R.

3. do a short installation of R.

4. run

> install.packages( c("car","rgl"))

> install.packages( c("spida","p3d"), repos = "http://R-Forge.R-project.org")

Week 3

It's really amazing to see the data moving by using GoogleVis and p3d in last lecture. It will be very useful in consulting or presentation. I think there is limitation to use these methods to analyze complex models. But we may choose 3 variables each time from a multiple model to draw a 3d plot to see the trend.

Week 4

Carrie has a question about the sign: In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean?

Answer: The plus sign in logical operation is called "exclusive or". Wiki link is here. But in the notes, it is specially defined in summary, Page 88.

Week 5

There is a slides for outlier detection. Hope it is helpful. slides

Week 6

What does that mean "The coefficient for "ses" (2.2999) is NOT "the estimated effect of ses" – it is the estimated "effect" of ses( in page 46 of the slide of Hierarchical_Models)?

Week 7

What does that mean "The smaller ellipse has 95% shadows."(in page 61 of the slide of Hierarchical_Models)?

  • I think it means that the shadows of the smaller ellipse gives the regular 95% confidence intervals for each of the betas - Constance

Week 8

What does the "-1" mean in the code "model.matrix( ~ Sex + Minority -1, dd)"? Thanks in advance.

  • -1 is used to remove the intercept term: when fitting a linear model y ~ x - 1 specifies a line through the origin. In the above script, the model chosen is the linear model depends on term Sex and Minority, the intercept term is removed. - Crystal

Week 9

What is the difference of panel data, longitudinal data,time series data and cross-sectional data?

Week 10

What is the difference of dropout, missing data and censored data? What is informative dropout?

Personal tools