# MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Laura Warren

## Contents

I am a master's student in applied statistics. I got my undergrad degree from the University of Guelph with a major in biology and a minor in statistics. I also have an M.Sc. degree from Guelph in epidemiology.

## Sample Exam Questions

### Week 1

The Warm-up...

Q1: What are the three kinds of lies?

A1: Lies, damned lies and statistics

The Real McCoy...

Q2: Given that the number of teen crimes was on the rise, which key piece of information was missing from the Harris government's misleading pitch suggesting the need for the development of more teen "boot camps"? If you do not remember the specific piece of information you may improvise.

A2: The teen crime rate was actually decreasing. While the number of crimes was increasing, the number of teens was increasing at a faster rate.

### Week 2

Q: Give one reason why plots are used in statistics. Explain your answer.

A1: Model diagnostics (i.e. regression plots to assess how well the chosen model fits the data, outlier detection, etc.)

A2: To get a general overview of the data. It is often easier to visualize patterns in your data set using graphical analyses than it is to detect them using numerical analyses.

I am sure there are many other appropriate answers. If somebody has another suitable answer, please feel free to post it here.

### Week 3

Q3:

The graph depicted is in beta space. What are the fitted slopes for stress and coffee respectively? Indicate with an asterisk which slopes (if any) are significantly different than zero.

A3: Stress slope = 1.1993*; Coffee slope = -0.4091

### Week 4

Q: What are two ways one can control for confounding factors when dealing with observational data?

A:

```  1) Matching - make comparisons between observations with similar confounding factors
2) Statistical control - include confounding factors in your model and adjust for them
statistically
3) Structural method - build a structural causal model
```

### Week 5

Q: What happens to the data ellipse and standard error when you have a type II outlier?

A: The data ellipse shrinks and the standard error will be smaller

### Week 6

Q: When is it best to use BLUES (Best Linear Unbiased Estimators) vs BLUPs (Best Linear Unbiased Predictors)?

A: It is best to use BLUEs when you are resampling from the same subsection of the population as a whole (for example, when you are resampling from the same school in the math achievement example used in class). BLUPs are best on average for resampling from the entire population of schools and students.

### Week 7

Q: What is the defining characteristic of longitudinal data analysis?

A: Repeated measures (i.e. data collected on the same subject over a period of time)

### Week 8

Q: Is the between- or within-subject variable time-invariant?

A: Between-subject variable

### Week 9

Q: Strong positive autocorrelation can be a symptom of ____________?

A: Lack of fit

### Week 10

Q: What is the difference between OLS and GLS?

A: OLS is an estimate based on pooled data, while GLS provides an estimate that is closer to the unpooled data. With balanced data the estimates will be the same, but for unbalanced data the estimates from OLS and GLS may be very different.

### Week 11

Q: What is the half recovery time (or half life)?

A: The half recovery time is the time it takes for a person (or any living thing for that matter) undergoing recovery to increase their recovery by half.

## Statistics in the Media

### Week 2

I see I am not the only person who found this article interesting!

I find this article titled "Why a cloned cat isn't exactly like the original: A new statistical law for cell differentiation" interesting because it shows just how all-encompassing the statistics field is. Statistics, in this instance, is used to predict a biological phenomenon. A group of researchers at the Institute of Physical Chemistry of the Polish Academy of Sciences discovered a statistical law which allows one to predict when cell differentiation will occur using a simple method of geometric construction. Previously, cell differentiation was thought to be a random process; nobody could explain why two genetically identical entities, kept under the same conditions would diverge (an effect known as population bimodality). Population divergence is an important aspect of a species' fitness. A genetically diverse population is more likely to withstand an unfavourable shift in its environment than its genetically uniform counterpart. This survival mechanism is especially important in this day and age as the number of antibiotic resistant bacteria is on the rise. Being able to predict at which antibiotic concentration population bimodality will occur may help to reduce the prevalence of antibiotic resistant bacteria.

### Week 3

Who knew?! The media has always fascinated me. It's interesting how easily it seems to influence so many people's opinions. I imagine many people would find it shocking to learn NYC is, statistically speaking, a more dangerous place for children to live in than Kabul is based solely on the huge amount of media attention dedicated to covering the war in Afganistan. I'll leave it here as this isn't a course taught by Noam Chomsky on manufacturing dissent. Certainly an interesting read.

### Week 4

The above article titled "Privatizing kills! Or does it?" suggests that privatizing alcohol sales leads to more alcohol related deaths. The findings from this article are misleading in the same fashion that the teen crime statistics were misleading. The increase in mortality rate was a function of an increase in population growth, not of alcohol consumption. Despite the statistical flaws, the article was published in Addiction, an American medical journal.

### Week 5

It's Super Bowl Sunday, so an article stating causality between Super Bowl loses and deaths from circulatory diseases. The article titled "How the Super Bowl can cause a heart attack" outlines a study published in the journal Clinical Cardiology which bases its findings on increased death rates caused by circulatory diseases in LA 2 weeks after the 1980 Super Bowl loss to the Redskins in comparison to other periods of time in January and February from 1980-1983. The study made the same comparisons from 1984-1987. LA won the Super Bowl in 1984. It's obvious that the Toronto Star doesn't have a statistical editor, even the title is misleading! While the stress associated with a Super Bowl loss MAY put an individual at higher risk for a heart attack, it is highly unlikely that it would cause a heart attack in an healthy individual with no pre-existing heart problems. The author actually corrects herself (possibly without even knowing it) in the first sentence stating that a Super Bowl loss may trigger a heart attack, which is something very different than causing a heart attack. The same study would also need to be conducted in multiple cities as the results could be specific to LA. Also, the article doesn't even mention whether or not the individuals who died were football fans or even watched the game. I think the author of this article needs to go back to the whiteboard...

### Week 6

http://stats.org/stories/2011/unbearable_costs_feb28_11.html This article deals with the cost effectiveness of country-wide meningitis vaccinations for children in the US. If implemented, the Vaccines for Children (VFC) program would cost the government \$640 million / year. While this figure seems daunting, in my opinion, an ounce of prevention is worth a pound of cure. If the VFC program is not implemented, a meningitis outbreak may occur leading to millions of dollars in outbreak control, not to mention the emotional and health costs associated with such an outbreak.

### Week 7

The first link brings you to the webpage for a marketplace episode on "superbugs in the supermarket". I actually saw this episode of marketplace when it aired a few weeks ago and found a link to it through an article on the CBC this week. The episode deals with antimicrobial resistant bacteria found in chicken products. All chicken products tested contained at least one "superbug" (strain of antimicrobial resistant bacteria) and one product contained eight varieties of superbugs. The overall message of the broadcast suggests that the use of antimicrobials which are also used on humans should ideally be stopped and if not stopped drastically reduced. The second link refers to the lack of statistical information available on the use of antimicrobials in food animal production in Canada. The numbers were recently released by the FDA for the US numbers, but without the Canadian numbers (and evidence of causality) preventing the use of antimicrobials in chicken production would be putting the wagon in front of the horse and would surely lead to the destruction of the Canadian poultry farming industry as we would no longer be competitive with the US market.

### Week 8

This article on the potential correlation between diet soda consumption and stroke got my attention because I drank copious amounts of diet coke during my undergrad. At first glance, the numbers are scary, 61% of people who drank diet sodas daily had an increased risk of stroke. However, upon further investigation of the data, we find out that this conclusion was based on a sample size of only 116 and while confounding factors were accounted for, their control was limited. For example, eating habits were only recorded at a single time-point which may not be reflective of that person's general eating habits. Regardless, the study certainly warrants further investigation.

### Week 9

Given that we're all involved with the educational process in one form or another, I'm sure we can all relate to this article about sleep deprivation. The article cites that 35% of Canadian youth (aged 13-17) and 61% of Canadian adults get fewer than eight hours of sleep on average per night. No mention is made of how these figures were arrived at, but based on the wording in parts of the article a survey seems likely. It's unlikely that the sample size and diversity are great enough to represent the Canadian population as a whole, however, the numbers are still cause for concern given the myriad of negative effects on one's health as a result of sleep deprivation.

### Week 10

The number of countries allowing the death penalty rose by four in 2010, up to a total of 23 countries according to an Amnesty International report. The report also included figures on the number of executions taking place. In the United States, the death penalty toll was 46, down from 52 in 2009. China is the biggest executioner; however, the number of people executed is not known as they do not make the information on executions public. Specific numbers were published for a variety of countries, primarily in the middle east. They did recognize that many of the figures are likely under-reported, as many executions take place and are not necessarily documented. The report did not however acknowledge the fact that while countries may not technically allow the death penalty, it likely occurs in many other countries. I thought it was interesting that the report said that the momentum is in favour of abolishing the death penalty and countries that still allow it are isolated. Just how isolated are the United States and China???

### Week 11

Perhaps the title for this post should be statistics not in the media. This article deals with increased radiation levels in Ontario as a result of the earthquakes in Japan. However, Health Canada and the Safety Commission don't feel it is necessary for us to know the facts and figures behind this headline. They merely reassure the reader that the increased level of radiation is not something Ontarians need to worry about. I, for one, would like to form my own decision about what is safe and what is not.

### Week 1

Comment on Bin Sun's page

### Week 2

This question may be trivial, but I am interested to hear the answer none the less. Why do the 3-D packages in R listed on the course homepage work now, but not last week?

### Week 3

Comment on Carrie Smith's page

### Week 4

Posted link to journal article on team Gray's discussion page

### Week 5

We have spent a fair amount of time in this course on data ellipses and 3-D plots and they seem to be an important aspect of data analysis. This is the first time I have seen either used in a stats course. I am curious to know if others have seen either data ellipses or 3-D plots in other courses? If so, which course(s)?

• I have seen data ellipses in Multivariate analysis course. In that course, we even learned how to draw these ellipses manually. The textbook for that course is Applied multivariate analysis by Johnson and Wichern.Gurpreet

### Week 6

I wasn't able to find the code for the R lab. Does anyone know where it is located?

### Week 7

Comment on Jessica Li's page.

### Week 8

I had a really hard time with this week's assignment. It seemed as though the majority of the answers weren't contained in the "Assignment" section of the Lab. Did other people encounter the same thing?

### Week 9

We had our first team meeting with our client this past week. The data seems as though it'll be a challenge to analyze, with so many variables and issues with missing data. The project is really interesting though and it's obvious a lot of thought went into the study design. I'm really looking forward to seeing what the analysis turns up.

### Week 10

We had our second team meeting this past week with a different member of the research team heading the project we are working on. It's interesting to see what different members of the same team feel is important. As a consultant, I think it is important to stay focused on the primary research questions and to always be clear of what the final objective is.

### Week 11

Does anybody know where the sample midterm is posted?