# MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Gurpreet Saini

## Contents

My name is Gurpreet Saini. Those who find it difficult to pronounce my name can call me 'Preety'. I am a Master student in applied Staistics at York University. I have done my undergraduation in Statistics from Panjab University, India. I have learned many new and interesting things in this program. I am familiar with the softwares SAS and R.

## Sample Exam Questions

### Week 1

Q-What is statistical Consulting? Why is it needed?

A-While demographic and scientific studies are the two pillars of modern statistics, statistical consulting is somewhat unique in the sense that we are always analysing somebody else's data. Statistics has certainly benefited from the rapid developments in computer technology and statisticsl softwares are now accessible to a wide audience, however, the complexity of the questions under the study in most disciplines requires atleast some expertise related to data analysis. Researchers don't have time to acquire this specialized knowledge along with the practical experience to apply it approximately. So, there is a need to involve someone who understands the scientific process and has the quantitative skills to fulfill this important role: STATISTICAL CONSULTANT.

### Week 2

Q-In the example of height of father and his son, we found that height of a father or a son should be close to mean height 68. This means that there will be no very high or very low height. This result is contrary to what actually happens. How can we resolve this paradox?

A-The paradox results from our failure to consider each of the predictions we have made is subject to error. The lower the r, the greater is the error of prediction and the more variability in the height will be obtained.

### Week 3

• Q- The data ellipse has a shape given by variance-covariance matrix of predictors. What would happen if X1 and X2 are uncorrelated? what would happen to marginal and conditional effect?
• A-If the predictors X1 and X2 are uncorrelated,and if Var(X1)=Var(X2), then the data ellipse will actually become a circle( and so will be the confidence ellipse) and consequently, the downward projection of the centre and oblique projection will be identical. That is, the marginal effect will be equal to the conditional effect. Also, if V(X1) is not equal V(X2), the downward projection of the centre of data ellipse and oblique projection will be identical. It's because the data ellipse is not tilted and neither is the confidence ellipse

### Week 4

• Q- Why does the shape of confidence interval is narrower in directions in which the data ellipse is larger?
• A- Its because the shape of confidence ellipse is the inverse of the shape of data ellipe i.e the length of shadow of one ellipse is inversly proportional to the size of the slice of the other ellipse.

### Week 5

• Q- What is added variable plot? Why do we need it?
• A- In a multiple regression, the added variable plot for a predictor X, say, is the plot showing the residual of Y against all predictors except X against the residual of X on all predictors except X, of course. In univariate regression, the relationship between the response Y and the predictor X is displayed by a scatter plot. In multiple regression, the situation gets complicated by the relationship between the several predictors, so a scatter plot between Y and any one of the X's need not reflect the relationship when adjusted for the other X's. The added variable plot is a graphical device that allows the display of just this relationship.

### Week 6

• Q- Why should we use multilevel models?
• A- 1.Correct inferences: Traditional multiple regression techniques treat the units of analysis as independent observations. One consequence of failing to recognise hierarchical structures is that standard errors of regression coefficients will be underestimated, leading to an overstatement of statistical significance. Standard errors for the coefficients of higher-level predictor variables will be the most affected by ignoring grouping.

2.Estimating group effects simultaneously with the effects of group-level predictors: An alternative way to allow for group effects is to include dummy variables for groups in a traditional (ordinary least squares) regression model. Such a model is called an analysis of variance or fixed effects model. In many cases there will be predictors defined at the group level, eg type of school (mixed vs. single sex). In a fixed effects model, the effects of group-level predictors are confounded with the effects of the group dummies, ie it is not possible to separate out effects due to observed and unobserved group characteristics. In a multilevel (random effects) model, the effects of both types of variable can be estimated.

### Week 7

• Q- What is the differnce between scheffe and bonferroni method? When can we use these methods ?
• A- The scheffe method provides simulataneous confidence over all possible linear combinations of coefficients. It can be used in any regression problem, but it is most useful when the predictors are continous.

The Bonferroni method provides simulatenous confidence over a finite set of linear combinations. It cannot be used with continous predictors without restricting the number of possible predictor values to a finite set.

### Week 8

• Q- When BLUEs and BLUPs are best?
• A- BLUE is best for resampling from the same school over and over again.
```    BLUP is best on average for resampling from the population of schools and students
```

### Week 9

• Q- What are GEE(Generalized estimating equations)?
• A- GEE takes into account the dependency of observations by specifying a "correlation structure".

### Week 10

• Q- Can G side parameters be highy collinear even if matrix X is orthogonal?
• A- Yes. Centering the variables of the RE model around the “point of minimal variance” will help but the resulting design matrix may be highly collinear.

### Week 11

• Q- How to decide what kind of model we should use for the given data?
• A- Use a model that captures characteristics of the process under study. For example, if you are looking at the recovery of the patient,then we know that recovery reaches a plateau after a while. We need a model that rises at first and then flattens. We can fit asymptotic model in this case.

## Statistics in the Media

### Week 1

Study Indicates Changes In TV Viewing Habits The study, titled “Multi Screen Media Consumption 2010,” revealed that 50 percent of the 300 million consumers polled were viewing Internet TV on a weekly basis. The study also indicated that individuals are now spending up to 35 percent of their leisure time in watching television. The study also indicated that 93 percent of those surveyed are still watching “linear” broadcast television, and 70 percent report that they are streaming, downloading, or watching recorded broadcast television offerings on a weekly basis. source

### Week 2

Most People Spend More Than Half Their Day Consuming Media [1]

### Week 3

Dieting in pregnancy can lower baby's IQ.

Cutting back on vital nutrients and calories in the first half of pregnancy stunts the development of an unborn child's brain, says a new study. Lack of nutrients interfered with the way brain cells connected in the unborn babies and altered the expression of hundreds of genes - many involved in cell growth and development, the researchers reported

Read more: Dieting in pregnancy can lower baby's IQ - The Times of India http://timesofindia.indiatimes.com/home/science/Dieting-in-pregnancy-can-lower-babys-IQ/articleshow/7312064.cms#ixzz1BnJJo84c Read more: Dieting in pregnancy can lower baby's IQ - The Times of India http://timesofindia.indiatimes.com/home/science/Dieting-in-pregnancy-can-lower-babys-IQ/articleshow/7312064.cms#ixzz1BnIfwWPJ [2]

Comment: I think, apart from dieting in pregnancy there are so many factors which may lower the baby's IQ like past health and smoking status of the mother.

### Week 4

• Aerobics help keep brain young

Aerobic activity may help keep the brain young, says a new research from the University of North Carolina at Chapel Hill School of Medicine.

Read more: Aerobics help keep brain young - The Times of India http://timesofindia.indiatimes.com/life-style/health-fitness/fitness/Aerobics-help-keep-brain-young/articleshow/4720083.cms#ixzz1ColJGk90

### Week 5

• Unhappy meal? Junk food lowers children's IQ

Researchers in Britain have carried out the study of 4,000 kids and found that those under the age of four eating a diet of processed food, fat and sugar have lower brain power at eight-and-a-half years. Their IQ fell by 1.67 for every increase on a chart which reflected how much processed fat they ate. And the damage could not be reversed – as diet at the ages of four and seven had no affect on IQ scores.

Read more: Unhappy meal? Junk food lowers children's IQ - The Times of India http://timesofindia.indiatimes.com/home/science/Unhappy-meal-Junk-food-lowers-childrens-IQ/articleshow/7457956.cms#ixzz1E2p0TOgz

### Week 6

• Obestity increases breast cancer

New research has found that obese women who do not exercise are more likely to get one of the most aggressive forms of breast cancers, suggests a study.

Read more: Obesity increases breast cancer risk - The Times of India http://timesofindia.indiatimes.com/life-style/health-fitness/health/Obesity-increases-breast-cancer-risk/articleshow/7610882.cms#ixzz1FT0fNe6c

### Week 7

More genes tied to heart risk found In what could soon pave the way for identifying people at risk from a future heart attack, an Indian-origin scientist-led team claims to have discovered over a dozen genes associated with coronary heart disease.

### Week 8

• Adults, Cell Phones and Texting (Pew)

Texting by adults has increased over the past nine months from 65% of adults sending and receiving texts in September 2009 to 72% texting in May 2010. Still, adults do not send nearly the same number of texts per day as teens ages 12-17, who send and receive, on average, five times more texts per day than adult texters. http://pewresearch.org/pubs/1716/adults-cell-phones-text-messages

### Week 9

• We give up dieting everyday at 3.23 pm

3.23pm is the time when dieters are most likely to give in to temptation and reach for a sweet treat, according to experts.

It was previously believed that elevenses or a crafty midnight snack spelled doom for the weight watcher. But experts now believe that mid-afternoon is the most dangerous time of the day. And they worked out the exact time we are most vulnerable. http://timesofindia.indiatimes.com/life-style/health-fitness/diet/We-give-up-dieting-everyday-at-323-pm/articleshow/7761544.cms

### Week 10

• Teen death rates outpace child rates

Child mortality rates declined so steeply in the second half of the 20th century that death rates among those aged 15 to 19 are now higher than for children, a new study finds.

### Week 11

• Many cancers avoidable with less drinking: study

Drinking too much alcohol might account for as much as 10 per cent of cancer cases in men and three per cent in women in Europe, according to results of a new study. Too much alcohol might also be responsible for almost 45 per cent of cancers in the mouth, larynx and throat in men and 25 per cent of those cancers in women, according to the results published Friday in a medical journal, BMJ. http://www.cbc.ca/news/health/story/2011/04/08/alcohol-cancer-deaths-europe.html

## Questions and Comments on Groupwork and Class Lectures

### Week 1

We all know the fact that smoking can't increase the life expectancy, but the example that we discussed in class was showing us opposite results. Based on this known fact, we started looking for the confounding factor. My question is: Do we still look for the confounding factor(s) if we don't know these facts??

### Week 2

I really like the new packages p3d and rgl in R. With the help of these packages, we can actually visualise the realtioship between two or more variables. Its really an amazing package!!!!!

### Week 3

• Answer to question on Jessica's Blog

If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.

• Some Important points from last lecture:

a) Marginal relatioship is achieved by taking the weighted average of conditional relationship. If the weights are same, then marginal relationship is equal to conditional relationship. But if the weights are different, then marginal and conditional realtionship are different and this is when simpsons paradox comes into the picture.

b)Confidence intervals estimate the effect of coffee and stress seperately whereas Confidence ellipse are used for estimating the effect of coffee and stress together.

c)The relationship between confidence ellipse and confidence interval: If we shrink the confidence ellipse by certain factor then the shrinked ellipse has the property that its horizontal and vertical shadow would give us the confidence interval for stress and coffee respectively.

### Week 4

• Q- In the slide # 37 for Visualizing Multiple Regression, How did we get 1.265 in indirect effect?
• A- Fit3<- lm ( Stress ~ Coffee )
```summary(Fit3)
```
```  Call:
lm(formula = Stress ~ Coffee)
```
```    Coefficients:
Estimate Std. Error t value Pr(>|t|)
Intercept -1.26708    6.07776  -0.208    0.837

Coffee    1.26515    0.07077  17.877 6.62e-13 ***
```

### Week 6

• Q- In the school example on HLM slides, What should we do if we have missing observations in our data?
• Also, comment on Laura Warren page

### Week 7

• Q- In the slide # 89 of Hierarchical models, why are we using poisson distribution for generating Nj's?

### Week 8

• Q What is cook's distance?
• A- Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate data points that are particularly worth checking for validity; to indicate regions of the design space where it would be good to be able obtain more data points.

### Week 9

• I have 2 questions:
• Q1- In HLM slides of school example, we had 2 categories of school and we took category as binary variable i.e 0 or 1. My question is : what should we do if we have 3 or more categories of school. Should we take it as 0,1,2 and so on? How should we interpret the estimates in this situation?
• Q2- How should we write our model when we have multiple responses instead of just single respone?
• Also comment on Luis's page.

### Week 10

• point to remember

The difference between two special effects is equal to their interaction.

### Week 11

• I cannot opent this link 'Practical overview of GLMMs'? Can someone help me?