# MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Luis Palma

### From Wiki1

## Contents |

## About me

Hi, my name is Luis Palma, I'm a master student in Applied statistics and I got my undergraduate degree in Industrial engineering. I am so interested in get more skills and knowledge in the statistics field so, I'm doing a master in Statistics and I´m familiar in the using of SAS and R programs.

## Sample Exam Questions

### Week 1

Q. In 2 dimensions, the relationship between sigma and sigma inverse is very simple. The Confidence Ellipse has a shape that is the 90 degree rotation of the data ellipse and the projection line for the simple regression is the 90 degree rotation of the line for the regression, the question is, What happen if X1 and X2 are uncorrelated and when Simpson's paradox can occur?

A. If X1 and X2 are uncorrelated, then the data ellipse is not tilted and neither is the confidence ellipse. Consequently the downward projection of the center and the oblique projection are identical and gamma bar one = beta bar one and in particular, Simpson's Paradox can only occur if X1 and X2 are correlated.

### Week 2

Q. If we see the figure we see that the two linear regression don't fit well, so the question is: Why our linear regressions give us a bad approach of the data?

A. As we know, in some cases we don't have to use linear regression in order to analyze our data because what we only obtain using this kind of model is a low fit in the approach of the data. In this case, we have to fit the data with a bivariate model, trace the confidence ellipse and the confidence intervals. In this figure we can see that we get a better fit with a bivariate model than the LR model because our data lies within the confidence ellipse.

### Week 3

Q. What is the difference between data-space and beta-space and what is the importance of beta-space?

A. In 'data space' the axes are variables and the points are observations in other words this is the natural space to look the data. But what happen if we need a understand regression more deeply – which will be particularly useful when we get to hierarchical data – we want to see models in a apace that is more natural for models: 'beta space'. In beta apace, the axes are coefficients,for example beta-coffee and beta-stress and the points are models (true models or fitted models)represented by their coefficients. We can also see confidence regions and confidence intervals in beta space because these are merely sets of models. The simple geometry of beta space elucidates some mysteries of data space.

### Week 4

Q. According to the figure, what is the relationship between

A. see the class notes of this week

### Week 5

Q.In the example of the Heart attack of a general linear combination of beta coffe and beta stress, how can we obtain the confidence interval if both graphs are euclidean?

A.We can obtain the confidence interval by taking the shadow of the confidence ellipse onto the corresponding axis in beta space.

### Week 6

Q. Mention four possible approaches of comparing mathach and its relationship with ses in the two school sectors example.

A. - Fit a 'between school' model: take the average ses and average mathach from each school and then perform a regression on the resulting means.

- Use a hierarchical model with a contextual variable to see that we were really estimating two things to begin with.

- Pool the data from the schools within each sector and analyze with OLS. i.e. completely ignore the individual schools and regress mathach on ses and sector alone.

- Use a two step approach: fit a regression to each school and then estimate the mean intercept and slope of the schools in each sector with a multivariate analysis of the using the fitted intercepts and slopes as data.

### Week 7

Q. How can we manipulate multilevel data?

A. In order to manipulate multilevel data we have to:

- Create a data set for each level.

- Include an index variable for each level – a variable that has a unique value for each row of its data set.

- Make sure all variable names are unique across all data sets except for the index variables that need to have the same name in a data set and the data immediately below.

### Week 8

Q. In Mixed Model for Longitudinal Data, What is the G matrix?

A.

- The G matrix is the variance covariance matrix for the random effect. Here

- It is usually a free positive definite matrix or it may be a structured pos-def matrix.

### Week 9

Q. What is the principal of marginality?

A.

1. Do not interpret p-values for terms that appears in high order.

2. Do not drop more than one variables because are not significant.

3. Do not necessarily drop variables because are not significant.

### Week 10

Q. What we have to do in high order interactions?, what is the stage 1 and 2?

A. In high order interactions we have to vusualize de model.

In stage 1 we visualize the response as a function of the predictors and in stage 2 we analize which one are numerical and which one are factor.

### Week 11

Q. Can nlme model specified as hierarchical model?, and, what is the diffence?

A. We can especified nlme model like a hierarchical model except that you can mix variable levels.

## Statistics in the Media, Paradoxes and Fallacies, Consulting Reflections

### Week 1

I'm very surprised about the new methods of solving problems, an article in the The American Statistician by George Casella [1] explains the Gibbs sampler method

Comment: This method is rising due to the fact of new technology such as high-speed-computers that without these, common people can´t never use this method. Although, statistician are still computing new ways to obtain some parameters such as k of this method.

### Week 2

In the web [ http://www.amstat.org/] there are many papers showing the implementation of the new applications of statistics, and the development os statistics goes very fast!

### Week 3

Understanding the anesthetized brain

Since 1846, when a Boston dentist named William Morton gave the first public demonstration of general anesthesia using ether, scientists and doctors have tried to figure out what happens to the brain during general anesthesia. Though much has been learned since then, many aspects of general anesthesia remain a mystery. How do anesthetic drugs interfere with neurons and brain chemicals to produce the profound loss of consciousness and lack of pain typical of general anesthesia? And, how does general anesthesia differ from sleep or coma? Read more in [2]

Comment: Many people believe that general anesthesia is simply a deep sleep, but it is not true. In fact, part of the reason that researchers wrote this paper is to make doctors more aware of the differences and similarities between general anesthesia, sleep and coma.

### Week 4

Ecologists study complex systems, and often need to use non-standard methods of sampling and data analysis. The data might be collected over a long-time scale, involve little spatial replication, or be highly aggregated in space. There have been many fruitful collaborations between ecologists and statisticians, often leading to the development of new statistical methods. In this brief overview of the subject, I will focus on three areas that have been of particular interest in the management of animal populations. I will also discuss the use of statistical methods in other areas of ecology, the aim being to highlight interesting areas of development rather than a comprehensive review. Read more in [3]

Comment: It is unbelivable that statistics is everywhere, specially due to the increasing popularity of computationally-intensive Bayesian methods of analysis will lead to ecologists being able to fit statistical models that provide them with a better understanding of the spatial and temporal processes operating in their study populations.

### Week 5

Here is a very interesting article that gives some tips to maximize the use of your gasoline in order to save money
*How the fuel price rise could save you money*[4]

### Week 6

The Earth's Temperature [5]

Comment: During the lasts years the temperature of our planet is increasing, Could Statistics help us to predict the consecuences such as the temperature of next years?

### Week 7

A uniquely American institution, the Electoral College consists of popularly elected representatives apportioned to each state according to the size of states and according to a Berkeley statistician Elchanan Mossel, this system of electing the president is significantly more likely to result in an erroneous election outcome compared to the simple majority voting system. Read more in **Fair and accurate elections**, statistically speaking [6].

### Week 8

**Ofiicial statistics and statistical ethics: selected issues** [7]

This paper addresses some very important questions related to statistical ethics and official statistics.

### Week 9

Another useful paper to understand ethics in statistics is **Statistics and Ethics: Some Advice for Young Statisticians**

### Week 10

**Almost 4 in 5 Irish smokers want to kick the habit**

Okay, Sean Penn smokes - as seen here last month in Berlin - but it's not big and it's not clever, right? . . .

### Week 11

15 Facts About The Cigarette Industry That Will Blow Your Mind [10]

In this article there are some ineteresting this about cigarrete industry sucha as:

•All of the cigarettes smoked per year in the U.S. weigh as much as 350,000 VW Beetles

•The Imperial Tobacco Company holds more than half of the total cigarette market in India where it sells about 98 billion cigarettes per year

## Questions and Comments

### Week 1

Q. I've never heard something about the p3d package of R both in books and internet and I think that it has very useful applications in order to analyze and interpret our results, but where can I find more information to improve the use of this package?. Furthermore, in the example that we saw in class, a possible way to explain why the analysis concluded that smoking is healthy is that the variable X affect Z and Z affect the response Y, so the question is, How can we find in a easy way the variable Z that really affect the response Y?.

A1. See the notes of the class.

A2. Doing a well-experimental design in order to help our poor model and give a good result about our confound factor, so we have to give a review of experimental design in order to find and use the methods that we could need.

### Week 2

Comment. In http://www.math.yorku.ca/people/georges/Files/MATH6627/Slides/Week2/Visualizing_Regression-II-Multiple.pdf I found some mistakes:

SLIDE #87

> library(car)

> data( *Ginzburg* )

> dd <- Ginzburg

Instead of being Ginzburg is Ginzberg

SLIDE #90 and #91

> fit.add <- lm( Dep ~ Simp + Fatal, dd)

We have to write:

> fit.add <- lm( depression ~ simplicity + fatalism, dd)

and,

> fit.int <- lm( Dep ~ Simp + Fatal, dd)

We have to write:

> fit.int <- lm( depression ~ simplicity + fatalism, dd)

### Week 3

The three presentations were quite interestings for me because I found some points that I want to mention:

- The p3d has a very good application in applied statistics because we can see how our model is fitting with 2 variables.

- The rotation of graphs in order to give a better presentation of our model.

- The interaction between the p3d and the googlevis. It is very useful because we can share our information with many people and give them a more detail information about what we are doing.

- Lattice give us the chance to make more powerful and flexible graphs than the traditional commands.

But, as we know all the things have some disadvantages and these functions are not the exception. These disadvantages are:

- Googlevis can't simulate too many variables. I mention this disadvantage because like statisticians, we know that modern statistics has to manage many variables in the models.

- p3d only graph two variables, but what happen if I have three or more of them.

### Week 4

It's hard to believe that it is very important how to interpret ellipses in the data analysis because not only we have to calculate but also we have to interpret all the possible ellipses that we can get of our formulas. We can use ellipses in both the data-space and beta-space depending of the analysis that we want to do.

### Week 5

In class we saw some methods to get the confidence intervals of the ellipses, the Scheffe and Bonferroni methods in a general way, but I think that we have to know how to trace this ellipses (the formula) and what are the functions to plot them in a program (R or SAS) because they are very important to us to in order to analyze better our data.

### Week 6

As we saw in the last lecture, trellis plot is very useful to show data in order to analize and have a better idea of what we are doing. If we want to find more details for the application of trellis we can look at the high school example. This example has a complex dataset so it is hard to model graphs, but trellis can give us a very detail graph of the observations.

### Week 7

In the R code example there is a very useful function, so I give a brief description of it.

capply(in spida package and is use to create additional level 2 and level 1): Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors and, in contrast with tapply, return within each cell a vector of the same length as the cell, which are then ordered to match the positions of the cells in the input. Capply tends to be slow when there are many cells and by is a factor. This may be due to the need to process all factor levels for each cell. Turning by into a numeric or character vector improves speed: e.g. capply( x, as.numeric(by), FUN).

### Week 8

At the beginning of the course we introduce two spaces: data space and beta space. In the example of last class (male vs female) we can observe that is very useful to use both spaces to find the patterns that our data could have. If we only plot the fitted lines in data space, we can miss some important information that fitted lines in beta space could bring us.

**Data space**

**Beta space**

### Week 9

I can´t run this part of the R code because there is an error in coefp, coefm and ses, somebody know how can I run this part?

xyplot( coef + '''coefp + coefm ~ ses''', dwgap, type = 'l') # add labels and grid xyplot( coef + '''coefp + coefm ~ ses''', dwgap, type = 'l', lwd = 2, ylab = "Gap in Math Achievement (+/- 2 SEs)", xlab = "SES", xlim = c(-2,2), scale = list( x = list( at = seq(-2,2,.5))), sub = 'Minority minus Majority gap as a function of SES', panel = function(x, y, ...) { panel.superpose(x,y,...) panel.abline(v=seq(-1.5,1.5,.5), col = 'gray') panel.abline(h=seq( -6,0, 1), col = 'gray') })

- Use the following in your code:

names(dwgap)<-c("coef","coefp","coefm") - Gurpreet

### Week 10

In non-linear mixed effects models the function that is useful in R is "nlme" and here is a description and an example:

1. We have to download te package nlme.

Description: This generic function fits a nonlinear mixed-effects model in the formulation described in Lindstrom and Bates (1990) but allowing for nested random effects. The within-group errors are allowed to be correlated and/or have unequal variances.

How to use: nlme(model, data, fixed, random, groups, start, correlation, weights, subset, method, na.action, naPattern, control, verbose)

Example:

fm1 <- nlme(height ~ SSasymp(age, Asym, R0, lrc),

data = Loblolly, fixed = Asym + R0 + lrc ~ 1, random = Asym ~ 1, start = c(Asym = 103, R0 = -8.5, lrc = -3.3))

summary(fm1) fm2 <- update(fm1, random = pdDiag(Asym + lrc ~ 1)) summary(fm2)

### Week 11

This is the last week to update the wiki and it is still too much things to learn about the functions of R and how to apply all the methods that we learn during this course in our project.