# MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li

### From Wiki1

Line 80: | Line 80: | ||

Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]] | Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]] | ||

+ | Thank you Constance! | ||

+ | |||

+ | ===Week 5=== | ||

+ | * |

## Revision as of 22:03, 15 February 2011

## Contents |

## About Me

I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.

## Sample Exam Questions

### Week 1

- Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment?
- A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.

### Week 2

- Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?
- A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval.

### Week 3

- Q: What is the beta space, and why we want to look at our data from the beta space?
- A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients.

### Week 4

- Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?
- A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.

### Week 5

- Q: Fromthe example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?
- A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.

## Statistics in the Media

### Week 1

- Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season.

Model for Projecting Severity of Flu Season

The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.

From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.

### Week 2

- High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime

Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes. In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.

### Week 3

- Drinkers Down Under switching from beer to wine

In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.

### Week 4

- Canadians spend most of waking life sedentary

spend most of waking life sedentary

StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.

### Week 5

- More private liquor stores, more alcohol deaths?

private liquor stores, more alcohol deaths?

The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.

## Questions and Comments

### Week 1

- The Question posted on Bin's blog.

I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.

### Week 2

- Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.

### Week 3

- In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?
- Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-
*Gurpreet* - Thank you very much!

### Week 4

- For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".

Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does **not** make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - Constance
Thank you Constance!