# MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Carrie Smith

## Contents

This year I began working towards my PhD in Psychology in the Quantitative Methods Area. I completed my Masters degree at York studying depth perception (within the Centre for Vision Research) and my undergraduate degree at the University of Toronto in Engineering Science, Aerospace option. As you can see I have quite a varied (and some might say strange) educational background. I have been attending SCS meetings since September and would like to consult in the coming academic year.

I have experience working with R, Matlab, SPSS and SAS.

And a proud member of Team Gray

## Sample Exam Questions

### Week 1

The cigarette data discussed in class indicated positive relationship between smoking and life expectancy. Ample evidence exists that smoking causes life expectancy to decrease, so there is likely more going on. Beyond causality, what are some important alternative possibilities that may complicate interpretation of the results of regression from observational data? (1) There may exist mediating variables (2) There may exist confounding factors (3) The sample employed might not a good representation of the population, either due to choice of participants or by using a small sample (4) A linear regression may not be capturing the true pattern (5) The dependent variable may in fact be causing the independent variable (6) Selection bias

### Week 2

How, by sketching a few lines on this graph, may we satisfy ourselves that the correlation between education and prestige is significant? Explain why. (ps. thanks for lending me your image Andy!

### Week 3

A researcher studying a schizophrenia medication in a clinical population discovers that the dosage is positively correlated with strength of symptoms. She is about to begin a recall because the drug appears to be making patients worse, when it occurs to her that perhaps there is another variable in play which restores the good name of her drug. What might that variable be? How could this variable have this effect (sketch!) and would you describe it as a 'confounding' or 'mediating' variable?

## Statistics in the Media

### Week 1

Researchers from Newcastle University conducted a study in which signs with pictures of staring eyes were posted in a busy little cafeteria to see whether the images would encourage patrons to act responsibly and tidy up after themselves. I came across this study via an article titled "Fake Watchful Eyes Discourage Naughty Behavior" on Wired Magazine's website.

The Wired article reports that, "The number of people who paid attention to the sign, and cleaned up after their meal, doubled when confronted with a pair of gazing peepers." Doubled, huh? Would that be 1 in 100 people to 2? 40% to nearly 80%? So I went to the source, an article by Ernest-Jones, Nettle & Bateson (2010). As it turns out the sample size was reasonable, and I think effect size in terms of proportions is quite respectable (see figure below), but Wired did make a mistake in reading the graph. The proportion of individuals who left litter decreased from .4 to .2, thus the proportion of individuals who cleaned up after themselves increased from .6 to .8. In other words, under the watchful eye of the creepy posters cafeteria patrons increased their pro-social behaviour by a factor of 1.3.

Though these errors are hardly earth shattering, it is an example of how results can easily be misinterpreted and misrepresented in public forums.

### Week 2

A blog written by a well respected author on Business Insider reports on a study on Twitter usage. The business news blogger concluded that Twitter is not a viable marketing tool, because half of Twitter users never read anyone else's Tweets ("THE TRUTH ABOUT TWITTER: Half Of Twitter Users Never Listen To A Word Anyone Else Says").

However, according to commentary on other sites (such as "Lies Damned Lies and Statistics") it would seem that the researchers who conducted the study and the Business Insider who reported on it failed to account for a very important detail, it is estimated that 40%-60% of Twitter accounts are abandoned. So, of courses many people never read anybody's Tweets, because they don't use Twitter!!

### Week 3

[Zyzmor’s Revenge?] This short article summarizes some cute findings regarding relationships between alphabetical position of surnames. One study showed that researchers with aphabetically early surnames more likely to gain tenure at a top university, become a fellow in the Econometric Society (it was a study by economists), and even win the Nobel Prize. This effect is explained by the fact that in many areas authors are listed alphabetically by last name, thus authors with early surnames are likely to have more citations, since many people (incorrectly) cite papers as Smith et al, 2000.

On the flip-side, people with late surnames also had to wait longer in lines at school, and as a result are slower (and presumably more thoughtful) at making buying decisions because they weren't rushed like the kids with early last names. I think causality there is pretty thin, and I wonder just how tiny this effect size is!!

### Week 4

CBC reports on a study that video game play is associated with anxiety and depression in youth (Kids' excess video gaming tied to anxiety). Researchers used latent growth mixture modeling, and concluded that pathological video game play causes depression and anxiety. This counters conventional assumption that youth who are depressed and/or anxious 'retreat' into game play to avoid their feelings. I look forward to learning more about this methodology in the future! The study was published in the January 2011 issue of Pediatrics (Pathological Video Game Use Among Youths: A Two-Year Longitudinal Study).

## Questions and Comments on Groupwork and Class Lectures

### Week 1

This is an oft referred to problem in Psychology research, but still an interesting one. A large portion of the studies coming out of university Psychology departments exclusively use undergraduate students as participants. Often, students that are required to participate to receive some credit for their courses. How much of what we think we 'know' about psychology might not actually apply to the general population??

Comment/Answer: - A whole lot!! Lets also consider the fact these psychology students are primarily North American and Caucasian! That makes the results of a vast majority of psychological studies even less generalizable! - Constance

### Week 2

As few other members of the course have already commented, it is becoming increasingly clear from the course consulting is quite a bit more involved that it might at first seem. Taking the time to step away from the problem and assess the basic elements of the analysis at hand couldn't be more important. I know I have rushed head first into experiments and/or analysis before really sitting down to get a clear idea of my objectives, and wasted a lot of time in the process!!

### Week 3

I would like to do something in R similar to what Team Rubin showed with the lattice package, but also want the fit summaries for the sub-plots.

This code splits the continuous X2 variable into 3 'shingles' and plots: X2group <- equal.count(data\$X2,number=3,overlap=0) xyplot(Y ~ X1 | X2group, data=data)

But this doesn't work: fit = lm(Y ~ X1 | X2group, data=data) summary( fit )

I would also love to have the data presented in one Y vs X1 plot, with data corresponding to the 3 levels (the categorization of a continuous var) of X2 in different colours.

Any help?

Hi Carrie,

You can try fit<-lm(Y~X1 | X2group, data=data) instead of fit = lm(Y ~ X1 | X2group, data=data) and then try the summary(fit) statement again. Hopefully that works!

--Lawarren 23:56, 25 January 2011 (EST)

Thanks for the suggestion, but unfortunately it didn't fix the problem. It doesn't throw an error, but gives 'NA' as the Estimate, Std. Error, t-value and p-value! I had to write some ugly unwieldy code to get what I wanted, and I'm sure there must be a better way...

### Week 4

In the slides for Visualizing Multiple Regression in the section on data ellipse for predictors (page 30), but the general predictor has a notation I'm not familiar with and missed in lecture. What does the plus sign in a in a circle mean? Thanks!

Andy: The plus sign in logical operation is called "exclusive or". Wiki link is here. But in the notes, it is specially defined in summary, Page 88.