# MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li

(Difference between revisions)
 Revision as of 01:08, 5 March 2011 (view source)Jyjli (Talk | contribs)← Older edit Revision as of 01:29, 5 March 2011 (view source)Jyjli (Talk | contribs) Newer edit → Line 109: Line 109: ===Week 7=== ===Week 7=== - * + *The comparison of the Scheffé Confidence Interval and the Bonferroni Confidence Interval: when the number of contrasts to be estimated is small, (about as many as there are factors) Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.

## Contents

I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.

## Sample Exam Questions

### Week 1

• Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment?
• A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.

### Week 2

• Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?
• A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval.

### Week 3

• Q: What is the beta space, and why we want to look at our data from the beta space?
• A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients.

### Week 4

• Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?
• A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.

### Week 5

• Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?
• A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.

### Week 6

• Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?
• A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.

• Q:
• A:

## Statistics in the Media

### Week 1

• Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season.

The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.

From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.

### Week 2

• High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime

Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes. In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.

### Week 3

• Drinkers Down Under switching from beer to wine

In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians \$36 billion a year.

### Week 4

• Canadians spend most of waking life sedentary

### Week 5

• More private liquor stores, more alcohol deaths?

The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.

### Week 6

• Video Games Are Good for Girls

Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.

### Week 7

Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis.

### Week 1

• The Question posted on Bin's blog.

I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.

### Week 2

• Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.

### Week 3

• In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?
• Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-Gurpreet
• Thank you very much!

### Week 4

• For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".

Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does not make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - Constance Thank you Constance!

### Week 5

• In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.

### Week 6

• From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.

### Week 7

• The comparison of the Scheffé Confidence Interval and the Bonferroni Confidence Interval: when the number of contrasts to be estimated is small, (about as many as there are factors) Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.