# MATH 6627 2010-11 Practicum in Statistical Consulting/Students/JaneDoe

## Contents

Hi. My name is Jane Doe. Welcome to my wiki page for the Statistical Consulting Practicum. A few facts about me: I have a reasonable amount of experience in applied stats in the social sciences field. I have been working as a sociologist for the past decade or so and want to become skilled at generating and interpreting my own data. I have experience with SAS, SPSS, Mathematica and minitab - although I am kinda rusty at lots of it. I have recent experience with designing and analyzing surveys and regression. Jane Doe is a fictional student and any resemblance to a real person is purely coincidental!!

## Sample Exam Questions

Note that about half of the questions should be suitable for an in-class exam and the other half for a take home exam.

### Week 1

• (in-class exam) We will call a "square root" of a square matrix M any square matrix A such that M = AA'. Show that a matrix has a square root if and only if it is a variance matrix.

### Week 2

• (take-home exam) Generate 100 observations for three variables Y, X and Z so that in the separate simple regressions of Y on each of X and Z neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 in a multiple regression of Y on both X and Z is rejected at the 5% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for X and Z and appropriate confidence ellipses for their two regression coefficients. Explain the relationship between the ellipses and the phenomenon exhibited in this problem. What does this example illustrate about the appropriatenes of forward stepwise regression to identify a suitable model to predict Y using both X and Z?

## Statistics in the Media

### Week 1

• The numbers can't stand alone: how paradigms matter. York's Dr. Pat Armstrong on some common quantitative issues: Media: Doubtful Data.pdf
• Why are women in Canada getting paid less than men?

In the Globe and Mail newspaper on Thursday, Margaret Wente raised some questions concerning various measures of the “gender gap” in Canada. You’ll find the full article here. Ms. Wente seems to accuse various labour reports and feminist activists of promoting their agenda with “a heaping cup of statistical abuse.” It is usually fascinating to watch people in extreme ideological and political positions attempt to harness statistics to make their point. In this case, it is not really a case of tampering with the numbers themselves – both sides are using nearly the exact same numbers to make two very different arguments.

One of the ways Ms. Wente believes that the statistics are being abused is the reporting that the gender wage gap in Canada currently is 70.5 cents. Meaning essentially that for every one dollar a male worker earns, a female worker earns 70.5 cents.

Ms. Wente goes on to demonstrate how she thinks this statement is statistical abuse: “Take the gender wage gap. To arrive at 70.5 cents, the report compares full-time annual wages between men and women. What it doesn’t mention is that men work more hours in a year than women do. Once you adjust for that, the gap narrows to 84 cents. And when you adjust for work experience and women’s preference for jobs in the public sector and social services, the gap shrinks to 93 cents.”

Juxtapose this with a recent article by leading feminist researchers on the economy. Dr. Sylvia Fuller of UBC and Dr. Leah Vosko the Canada Research Chair in Feminist Political Economy have recently published an article that arrives at very similar statistics. The full article is available here, for a fee or with a subscription.

The difference is the lens used to interpret these statistics. While Ms. Wente believes that the narrowing wage gap as you control for different variables demonstrates that the wage gap is not really as big as it might initially appear, Drs. Fuller and Vosko choose to see the fact that accounting for job characteristics causes the gap to decrease as indications that the women are experiencing discrimination and less choice in those areas as well.

For example, there is not dispute currently that the gender was gap narrows when you control for the fact that more women work in public sector jobs and social service jobs. But does this mean that really, women get paid nearly the same as men bu they simply prefer to work in sectors and industries that pay less to everyone overall? Or does this mean that not only do women get paid less than men overall, women are also confined to industries and sectors that pay less as well?

### Week 2

I found an amusing cartoon about correlation and causality. Let your mouse hover over the cartoon and an interesting message appears. It seems that the cartoonist has though fairly deeply about correlation and causality.

## Questions and Comments on Groupwork and Class Lectures

### Week 1

*Q: I have not been able to figure out the code to create the graph that we looked at in class that we used to look at the pay equity vs. employment equity issue. I am working on the assignment and want to create one for the question on police discrimination. I have found the functions in car that create data ellipses and confidence ellipses, and I have the R code from your visualizing regression notebook, but I just can't get it to work with the categorical variables. Is this possible? Issues with categorical variables seem to come up for me with a very high frequency!

A: Categorical variables can be tricky in R. We'll probably revisit this during the course. Here's a straightforward way. I'll use the Prestige data frame in library(car) as an example. There are three non-missing values for type.

``` library(car)
data(Prestige)
dd = Prestige

cols = c('red','green','blue')   # the colours you'd like to use for the three types
```

The line farther below, where we plot income vs education, shows the power and the danger of factors in R. The variable 'type' is a factor. When you print it with

``` dd\$type
```

it looks like a character variable. But it's 'really' integers that index into a vector of 'levels'. To see the 'real' dd\$type, you use:

``` unclass(dd\$type)
```

The following line uses the two personalities of 'type'. When 'type' is used to index 'cols', it acts like an integer. The first character of 'type' is used as a plotting character (pch):

``` plot( income ~ education , dd, pch = as.character(type), col = cols[type])
```

There are more 'traditional' ways of doing the following but here's an 'Rish' approach illustrating the use of the function 'split' and 'lapply': It also uses the 'dell' function in 'fun.R' but this could be done with John's functions in car:

``` source( 'http://www.math.yorku.ca/~georges/R/fun.R')

lapply( split( dd, dd\$type), function(x) with( x, lines(dell( education, income), col = cols[type], lwd=2)))
```

Finally, let people know what they're looking at:

``` legend( locator(1), levels( dd\$type) , col = cols, pch = substring( levels(dd\$type), 1,1))
```

which produces:

### Week 2

• Q2: What is the difference in running a Generized Mixed Model Regression and a Repeated Measures ANOVA?