http://scs.math.yorku.ca/index.php?title=Special:Contributions/Jyjli&feed=atom&limit=50&target=Jyjli&year=&month=Wiki1 - User contributions [en]2020-03-31T10:49:25ZFrom Wiki1MediaWiki 1.16.1http://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-04-12T02:03:16Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
<br />
===Week 10===<br />
*Q: What is EBLUP, and what does it do?<br />
*A: If we replace the unknown parameters with their estimates, we get the EBLUP (Empirical BLUP). The EBLUP optimally combines the information from the ith cluster with the information from the other clusters. We borrow strength from the other clusters.<br />
<br />
===Week 11===<br />
*Q:Compare the model fitting for non-linear and linear model, What are the differences?<br />
*A:1.With a linear model we only need to specify the predictors. We don't need to say anything about the parameters because it is understood that there is exactly one parameter for each regressor and each parameter multiplies its regressor. The non-linear model formula for a non-linear model needs to specify both the parameters and the regressor. 2. The algorithm for fitting is iterative and needs starting values which you generally need to supply. 3. In non-linear mixed effects models(with nlme) parameters in the non-linear model are themselves be modeled through linear models potentially based on other predictors. This allows the non-linear model to be simpler since it only needs to capture the essentially non-linear aspects of the model. This formulation is easier to fit numerically.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
===Week 10===<br />
[http://www.traveldailynews.com/pages/show_page/42272-90%25-of-travelers-would-choose-rail-over-air|90% of travelers would choose rail over air]<br />
<br />
In a recent poll of global travelers studied the travellers' preferance between rail and air. With 90% of respondents saying they would like to see rail options displayed alongside flights when searching for travel. This poll concluded that time, cost and comfort are the 3 key factors considered by consumers when booking travel and flying is coming in second increasingly often. From the time part: The results reveal that travellers are considering total travel time,getting from door to door,and the full travel experience when choosing mode of transport.There are most of people would accept having the entire time from door-to-door be longer to avoid the process of checking in, security and boarding, they would willingly add an hour or more of total travel to their trips to avoid the hassles of long lines, airport security and baggage fees. Consider the cost of the travles, most of people take action to avoid paying bag fees, strategizing packing days in advance and stuffing carry-ons to maximum capacity to avoid checking bags. Most People are ecpecting for more comfortable waitting area for their travel in the future. <br />
<br />
===Week 11===<br />
[http://www.nimh.nih.gov/science-news/2009/national-survey-tracks-rates-of-common-mental-disorders-among-american-youth.shtml|National Survey Tracks Rates of Common Mental Disorders Among American Youth]<br />
<br />
This is a survey that provides a comprehensive look at the prevalence of common mental disorders. It conducted from 2001 to 2004 had 3042 participants. The results include data from children and adolescents ages 8 to 15. In the study, the young people were interviewed directly. Family members also provided information about their children's mental health. The researchers tracked six mental disorders—generalized anxiety disorder (GAD), panic disorder, eating disorders (anorexia and bulimia), depression, attention deficit hyperactivity disorder (ADHD) and conduct disorder. The participants were also asked about what treatment, if any, they were receiving. The result shows 13 percent of respondents met criteria for having at least one of the six mental disorders within the last year. About 1.8 percent of the respondents had more than one disorder, usually a combination of ADHD and conduct disorder. Researchers found that among the paitients, males are more likely than females to have the ADHD, but females more likely than males to feel depressed. I think the reasons might be that usually boys are more active than girls, and girls are more emotional and soft than boys. The propose of the study is to transform the understanding and treatment of mental illnesses through basic and clinical research, paving the way for prevention, recovery and cure.<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.<br />
<br />
===Week 10===<br />
*I cannot find the R code that George explained to us during last lecture. Could someone show me where it is, please? Thank you.<br />
<br />
===Week 11===<br />
*Our group had a meeting with our client, this meeting is very helpful for our analysis. Since we have received several different datasets and some of these datasets are duplicate with less variables, we were not sure which datasets we should use specifically for our analysis questions. In addition, we also have declared our the responses of our analysis.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-04-12T01:29:55Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
<br />
===Week 10===<br />
*Q: What is EBLUP, and what does it do?<br />
*A: If we replace the unknown parameters with their estimates, we get the EBLUP (Empirical BLUP). The EBLUP optimally combines the information from the ith cluster with the information from the other clusters. We borrow strength from the other clusters.<br />
<br />
===Week 11===<br />
*Q:Compare the model fitting for non-linear and linear model, What are the differences?<br />
*A:1.With a linear model we only need to specify the predictors. We don't need to say anything about the parameters because it is understood that there is exactly one parameter for each regressor and each parameter multiplies its regressor. The non-linear model formula for a non-linear model needs to specify both the parameters and the regressor. 2. The algorithm for fitting is iterative and needs starting values which you generally need to supply. 3. In non-linear mixed effects models(with nlme) parameters in the non-linear model are themselves be modeled through linear models potentially based on other predictors. This allows the non-linear model to be simpler since it only needs to capture the essentially non-linear aspects of the model. This formulation is easier to fit numerically.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
===Week 10===<br />
[http://www.traveldailynews.com/pages/show_page/42272-90%25-of-travelers-would-choose-rail-over-air|90% of travelers would choose rail over air]<br />
<br />
In a recent poll of global travelers studied the travellers' preferance between rail and air. With 90% of respondents saying they would like to see rail options displayed alongside flights when searching for travel. This poll concluded that time, cost and comfort are the 3 key factors considered by consumers when booking travel and flying is coming in second increasingly often. From the time part: The results reveal that travellers are considering total travel time,getting from door to door,and the full travel experience when choosing mode of transport.There are most of people would accept having the entire time from door-to-door be longer to avoid the process of checking in, security and boarding, they would willingly add an hour or more of total travel to their trips to avoid the hassles of long lines, airport security and baggage fees. Consider the cost of the travles, most of people take action to avoid paying bag fees, strategizing packing days in advance and stuffing carry-ons to maximum capacity to avoid checking bags. Most People are ecpecting for more comfortable waitting area for their travel in the future. <br />
<br />
===Week 11===<br />
[http://www.nimh.nih.gov/science-news/2009/national-survey-tracks-rates-of-common-mental-disorders-among-american-youth.shtml|National Survey Tracks Rates of Common Mental Disorders Among American Youth]<br />
<br />
This is a survey that provides a comprehensive look at the prevalence of common mental disorders. It conducted from 2001 to 2004 had 3042 participants. The results include data from children and adolescents ages 8 to 15. In the study, the young people were interviewed directly. Family members also provided information about their children's mental health. The researchers tracked six mental disorders—generalized anxiety disorder (GAD), panic disorder, eating disorders (anorexia and bulimia), depression, attention deficit hyperactivity disorder (ADHD) and conduct disorder. The participants were also asked about what treatment, if any, they were receiving. The result shows 13 percent of respondents met criteria for having at least one of the six mental disorders within the last year. About 1.8 percent of the respondents had more than one disorder, usually a combination of ADHD and conduct disorder. Researchers found that among the paitients, males are more likely than females to have the ADHD, but females more likely than males to feel depressed. I think the reasons might be that usually boys are more active than girls, and girls are more emotional and soft than boys. The propose of the study is to transform the understanding and treatment of mental illnesses through basic and clinical research, paving the way for prevention, recovery and cure.<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.<br />
<br />
===Week 10===<br />
*I cannot find the R code that George explained to us during last lecture. Could someone show me where it is, please? Thank you.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-04-12T01:12:59Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
<br />
===Week 10===<br />
*Q: What is EBLUP, and what does it do?<br />
*A: If we replace the unknown parameters with their estimates, we get the EBLUP (Empirical BLUP). The EBLUP optimally combines the information from the ith cluster with the information from the other clusters. We borrow strength from the other clusters.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
===Week 10===<br />
[http://www.traveldailynews.com/pages/show_page/42272-90%25-of-travelers-would-choose-rail-over-air|90% of travelers would choose rail over air]<br />
<br />
In a recent poll of global travelers studied the travellers' preferance between rail and air. With 90% of respondents saying they would like to see rail options displayed alongside flights when searching for travel. This poll concluded that time, cost and comfort are the 3 key factors considered by consumers when booking travel and flying is coming in second increasingly often. From the time part: The results reveal that travellers are considering total travel time,getting from door to door,and the full travel experience when choosing mode of transport.There are most of people would accept having the entire time from door-to-door be longer to avoid the process of checking in, security and boarding, they would willingly add an hour or more of total travel to their trips to avoid the hassles of long lines, airport security and baggage fees. Consider the cost of the travles, most of people take action to avoid paying bag fees, strategizing packing days in advance and stuffing carry-ons to maximum capacity to avoid checking bags. Most People are ecpecting for more comfortable waitting area for their travel in the future. <br />
<br />
===Week 11===<br />
[http://www.nimh.nih.gov/science-news/2009/national-survey-tracks-rates-of-common-mental-disorders-among-american-youth.shtml|National Survey Tracks Rates of Common Mental Disorders Among American Youth]<br />
<br />
This is a survey that provides a comprehensive look at the prevalence of common mental disorders. It conducted from 2001 to 2004 had 3042 participants. The results include data from children and adolescents ages 8 to 15. In the study, the young people were interviewed directly. Family members also provided information about their children's mental health. The researchers tracked six mental disorders—generalized anxiety disorder (GAD), panic disorder, eating disorders (anorexia and bulimia), depression, attention deficit hyperactivity disorder (ADHD) and conduct disorder. The participants were also asked about what treatment, if any, they were receiving. The result shows 13 percent of respondents met criteria for having at least one of the six mental disorders within the last year. About 1.8 percent of the respondents had more than one disorder, usually a combination of ADHD and conduct disorder. Researchers found that among the paitients, males are more likely than females to have the ADHD, but females more likely than males to feel depressed. I think the reasons might be that usually boys are more active than girls, and girls are more emotional and soft than boys. The propose of the study is to transform the understanding and treatment of mental illnesses through basic and clinical research, paving the way for prevention, recovery and cure.<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.<br />
<br />
===Week 10===<br />
*I cannot find the R code that George explained to us during last lecture. Could someone show me where it is, please? Thank you.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-04-12T01:08:12Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
<br />
===Week 10===<br />
*Q: What is EBLUP, and what does it do?<br />
*A: If we replace the unknown parameters with their estimates, we get the EBLUP (Empirical BLUP). The EBLUP optimally combines the information from the ith cluster with the information from the other clusters. We borrow strength from the other clusters.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
===Week 10===<br />
[http://www.traveldailynews.com/pages/show_page/42272-90%25-of-travelers-would-choose-rail-over-air|90% of travelers would choose rail over air]<br />
<br />
In a recent poll of global travelers studied the travellers' preferance between rail and air. With 90% of respondents saying they would like to see rail options displayed alongside flights when searching for travel. This poll concluded that time, cost and comfort are the 3 key factors considered by consumers when booking travel and flying is coming in second increasingly often. From the time part: The results reveal that travellers are considering total travel time,getting from door to door,and the full travel experience when choosing mode of transport.There are most of people would accept having the entire time from door-to-door be longer to avoid the process of checking in, security and boarding, they would willingly add an hour or more of total travel to their trips to avoid the hassles of long lines, airport security and baggage fees. Consider the cost of the travles, most of people take action to avoid paying bag fees, strategizing packing days in advance and stuffing carry-ons to maximum capacity to avoid checking bags. Most People are ecpecting for more comfortable waitting area for their travel in the future. <br />
<br />
===Week 11===<br />
[http://www.nimh.nih.gov/science-news/2009/national-survey-tracks-rates-of-common-mental-disorders-among-american-youth.shtml|National Survey Tracks Rates of Common Mental Disorders Among American Youth]<br />
<br />
This is a survey that provides a comprehensive look at the prevalence of common mental disorders. It conducted from 2001 to 2004 had 3042 participants. The results include data from children and adolescents ages 8 to 15. In the study, the young people were interviewed directly. Family members also provided information about their children's mental health. The researchers tracked six mental disorders—generalized anxiety disorder (GAD), panic disorder, eating disorders (anorexia and bulimia), depression, attention deficit hyperactivity disorder (ADHD) and conduct disorder. The participants were also asked about what treatment, if any, they were receiving. The result shows 13 percent of respondents met criteria for having at least one of the six mental disorders within the last year. About 1.8 percent of the respondents had more than one disorder, usually a combination of ADHD and conduct disorder. Researchers found that males are more likely than females to have the ADHD, but females more likely than males to feel depressed. I think the reasons might be that boys are more <br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.<br />
<br />
===Week 10===<br />
*I cannot find the R code that George explained to us during last lecture. Could someone show me where it is, please? Thank you.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-30T22:16:14Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
<br />
===Week 10===<br />
*Q: What is EBLUP, and what does it do?<br />
*A: If we replace the unknown parameters with their estimates, we get the EBLUP (Empirical BLUP). The EBLUP optimally combines the information from the ith cluster with the information from the other clusters. We borrow strength from the other clusters.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
===Week 10===<br />
[http://www.traveldailynews.com/pages/show_page/42272-90%25-of-travelers-would-choose-rail-over-air|90% of travelers would choose rail over air]<br />
<br />
In a recent poll of global travelers studied the travellers' preferance between rail and air. With 90% of respondents saying they would like to see rail options displayed alongside flights when searching for travel. This poll concluded that time, cost and comfort are the 3 key factors considered by consumers when booking travel and flying is coming in second increasingly often. From the time part: The results reveal that travellers are considering total travel time,getting from door to door,and the full travel experience when choosing mode of transport.There are most of people would accept having the entire time from door-to-door be longer to avoid the process of checking in, security and boarding, they would willingly add an hour or more of total travel to their trips to avoid the hassles of long lines, airport security and baggage fees. Consider the cost of the travles, most of people take action to avoid paying bag fees, strategizing packing days in advance and stuffing carry-ons to maximum capacity to avoid checking bags. Most People are ecpecting for more comfortable waitting area for their travel in the future. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.<br />
<br />
===Week 10===<br />
*I cannot find the R code that George explained to us during last lecture. Could someone show me where it is, please? Thank you.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-29T01:07:40Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
===Week 10===<br />
[http://www.traveldailynews.com/pages/show_page/42272-90%25-of-travelers-would-choose-rail-over-air|90% of travelers would choose rail over air]<br />
<br />
In a recent poll of global travelers studied the travellers' preferance between rail and air. With 90% of respondents saying they would like to see rail options displayed alongside flights when searching for travel. This poll concluded that time, cost and comfort are the 3 key factors considered by consumers when booking travel and flying is coming in second increasingly often. From the time part: The results reveal that travellers are considering total travel time,getting from door to door,and the full travel experience when choosing mode of transport.There are most of people would accept having the entire time from door-to-door be longer to avoid the process of checking in, security and boarding, they would willingly add an hour or more of total travel to their trips to avoid the hassles of long lines, airport security and baggage fees. Consider the cost of the travles, most of people take action to avoid paying bag fees, strategizing packing days in advance and stuffing carry-ons to maximum capacity to avoid checking bags. Most People are ecpecting for more comfortable waitting area for their travel in the future. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.<br />
<br />
===Week 10===<br />
*I cannot find the R code that George explained to us during last lecture. Could someone show me where it is, please? Thank you.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-29T00:58:16Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
===Week 10===<br />
[http://www.traveldailynews.com/pages/show_page/42272-90%25-of-travelers-would-choose-rail-over-air|90% of travelers would choose rail over air]<br />
<br />
In a recent poll of global travelers studied the travellers' preferance between rail and air. With 90% of respondents saying they would like to see rail options displayed alongside flights when searching for travel. This poll concluded that time, cost and comfort are the 3 key factors considered by consumers when booking travel and flying is coming in second increasingly often. From the time part: The results reveal that travellers are considering total travel time,getting from door to door,and the full travel experience when choosing mode of transport.There are most of people would accept having the entire time from door-to-door be longer to avoid the process of checking in, security and boarding, they would willingly add an hour or more of total travel to their trips to avoid the hassles of long lines, airport security and baggage fees. Consider the cost of the travles, most of people take action to avoid paying bag fees, strategizing packing days in advance and stuffing carry-ons to maximum capacity to avoid checking bags. Most People are ecpecting for more comfortable waitting area for their travel in the future. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-29T00:24:57Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
===Week 10===<br />
[http://www.traveldailynews.com/pages/show_page/42272-90%25-of-travelers-would-choose-rail-over-air|90% of travelers would choose rail over air]<br />
<br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T02:17:03Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
<br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T02:15:25Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===<br />
*Meaning of Apply Functions in R:<br />
1. lapply returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. <br />
2. sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate.<br />
The lapply() function works on any list. The "l" in "lapply" stands for list. The "s" in "sapply" stands for simplify.<br />
3. vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br />
4. tapply is a very powerful function that let us break a vector into pieces and apply aome function to each of the pieces. we need to specify how to breakdown the pieces.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T01:57:10Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model (looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T01:56:16Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
*Q: In the process of fitting a model, is it correct if we drop the variables in the model by only look at the P-value of these variables in the fitted model?<br />
*A: We cannot drop more than one variable at the same time from the model by testing the significance of the variables in the model(looking at P-value), since after we drop one variable from the model, the p-value would be change for other variables. Therefore, we should do Wald test if we would like to drop two or more variables from the model at the same time.<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T01:36:03Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T01:35:08Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
Risk statistics can be used persuasively to present health interventions in different lights. The different ways of expressing risk can prove confusing and there has been much debate about how to improve the communication of health statistics. Choosing the appropriate way to present risk statistics is key to helping people make well-informed decisions. A new Cochrane Systematic Review found that health professionals and consumers may change their perceptions when the same risks and risk reductions are presented using alternative statistical formats. In the new study, Cochrane researchers reviewed data from 35 studies assessing understanding of risk statistics by health professionals and consumers. They found that participants in the studies understood frequencies better than probabilities. Although the researchers say further studies are required to explore how different risk formats affect behaviour, they believe there are strong logical arguments for not reporting relative values alone. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T01:24:23Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
===Week 9===<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
===Week 9===<br />
[http://www.sciencedaily.com/releases/2011/03/110316084421.htm|Poorly Presented Risk Statistics Could Misinform Health Decisions]<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]<br />
<br />
===Week 9===</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T01:20:50Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the significance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-20T01:20:08Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the signi®cance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/RubinMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Rubin2011-03-16T03:33:05Z<p>Jyjli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Donald_Rubin<br />
=='''SIMPSON'S PARADOX'''==<br />
<br />
<br />
==='''DEFINITION'''===<br />
<br />
* An apparent paradox in which the association between two variables (X and Y) changes when a third variable (Z) is taken into account.<br />
<br />
* Z is called a confounding factor.<br />
<br />
<br />
==='''CONFOUNDING FACTOR'''===<br />
<br />
A confounding factor is associated with both the outcome variable and the primary risk factor (or independent variable) of interest.<br />
<br />
<br />
[[File:SP1.jpg]]<br />
<br />
<br />
==='''EXAMPLE I: UNIVERSITY ADMISSION'''===<br />
<br />
University of California, Berkeley was sued for biased acceptance rates with respect to gender, men applying to graduate school were more likely to be accepted than women.<br />
<br />
<br />
[[File:sp2.jpg]]<br />
<br />
<br />
However, when the male to female acceptance rates were considered for each individual department there was actually a slight favourable bias toward the acceptance of women.<br />
<br />
<br />
[[File:sp3.jpg]]<br />
<br />
<br />
[[File:sp4.jpg]]<br />
<br />
<br />
==='''EXAMPLE II: MALARIA INFECTION'''===<br />
<br />
Male gender as a risk factor for malaria.<br />
<br />
<br />
[[File:sp5.jpg]]<br />
<br />
<br />
Odds Ratio = 1.7 (P<0.05).<br />
<br />
Confouder vs exposure<br />
<br />
<br />
[[File:sp6.jpg]]<br />
<br />
<br />
Odds Ratio = 7.8<br />
<br />
Confounder vs outcome<br />
<br />
<br />
[[File:sp7.jpg]]<br />
<br />
<br />
Odds Ratio = 5.3<br />
<br />
<br />
[[File:sp8.jpg]]<br />
<br />
<br />
Odds Ratio Outdoor Occupation = 1.06<br />
Odds Ratio Indoor Occupation = 1.00<br />
<br />
<br />
[[File:sp9.jpg]]<br />
<br />
<br />
=='''LATTICE'''==<br />
<br />
[[Media:LATTICE.pdf.pdf]]<br />
<br />
<br />
=='''ASSIGNMENT 2'''==<br />
<br />
'''Pulse Article: For our own good''' [http://www.math.yorku.ca/~georges/Courses/2565/StatisticsInTheNews030926.html].<br />
<br />
<br />
==='''Q1'''===<br />
<br />
Whether the article suggest a causal relationship between two variables. If so which? Are the data observational or experimental?<br />
<br />
<br />
'''DISCUSSION''' <br />
<br />
Yes, this article suggests a causal relationship between the type of advertising and unnecesary prescriptions. This paper also pointed out a big emphasis on the fact that drug companies spend too much money on direct-to-consumer advertising of prescription medicines which may lead to health care costs associated with unnecessary prescriptions. The data should be observational because the paper doesn't mention that they chose people that advertising help or not to avoid unnecessary prescriptions. <br />
<br />
<br />
==='''Q2'''===<br />
<br />
Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
I think that there could be a mediating factor leading to unnecessary health care costs. This mediating factor could be the secondary reactions of the drug. In some cases the secondary reactions are so important because people can spend a lot of money as a result of drug complications. I agree with the cause-reaction of this article because they don't believe what the president of Pfizer said regarding the unspoken truth about advertising constituting one of the largest and most successful public health campaigns in US history. Also, a confounding factor may exist which leads people to required additional prescriptions. For example, gender, in some cases secondary reactions are different between men and women. <br />
<br />
<br />
==='''Q3'''=== <br />
<br />
Have any confounding factors been accounted for in the analysis?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
No, confounding factors were not accounted for. They emphasize the type of advertising. <br />
<br />
<br />
==='''Q4'''===<br />
<br />
Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
The article does not mention anything about controlling mediating factor. Also, they did not do a causal interpretation. There was no mention of a relationship between the type of drug and the case group. <br />
<br />
<br />
==='''Q5'''===<br />
<br />
What is your personal assessment of the evidence for causality in the study that is the subject of the article?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
When I read the article I thought that it would be more interesting but I was wrong due to the lack of information. They don't support their claims. In other words, they don't proof what they are writing, also, they only give a very general point of view. Instead of understanding what they are writing, they only give us too many questions. <br />
<br />
Laura: I found the topic of the article quite interesting, but like Luis, felt the article was lacking in substantive information. <br />
<br />
<br />
===QUESTIONS 2 PART===<br />
<br />
Q3: A survey of students at York reveals that the average class size of the classes they attend is 130. A survey of faculty shows an average class size of 30. The students must be exaggerating their class sizes or the faculty under-reporting.<br />
<br />
A3: This could be an instance of sampling bias. Maybe all of the students surveyed were coming from the same 130 person class, while the professors surveyed were all coming from smaller classes. Could be the difference between surveying 1st and 2nd year courses vs. 3rd year courses, etc.<br />
<br />
Q6: If smoking really is bad for your health, you expect a comparison of a group of people who have quit smoking with a group that have continued to reveal that the group quitting is, on average, healthier than the group that continued.<br />
<br />
A6: Maybe the people who quit are quitting for health reasons (i.e. heavy smokers tend to quit while light smokers tend to continue smoking). Also, when you stop smoking your stress level may go up, which would decrease your overall level of health.<br />
<br />
Q9: If you want to reduce the number of predictor variables in a model, a technique like forward stepwise regression will generally do a good job of identifying which variables you should keep.<br />
<br />
A9: For a regression model with n possible predictor variables, the first step involves evaluating n predictor variable subsets, each consisting of a single predictor variable, and selecting the one with the highest evaluation criterion. The next step selects from among n-1 subsets, the next step from n-2 subsets, and so on. It is not guaranteed to find the subset with the highest evaluation criterion.<br />
Stepwise selection models also ignore the problem of multiple inference (performing more than one statistical inference procedure on the same data set), which can lead to erroneous conclusions. Should not rely solely on automated procedures, it may make sense from a logical perspective to include variables irrespective of their significance (i.e. confounders).<br />
<br />
Q12: In general we don’t need to worry about interactions between variables unless there is a correlation between them.<br />
<br />
A12: Correlation measures the strength of the linear relationship between quantitative variables, relationship may not be linear. Also, correlation assumes variables are normally distributed, which is not always the case. <br />
<br />
Q15: Statistical theory shows that the best way to impute a mid-term grade for a student who missed the test with a valid excuse is to use the predicted mid-term grade based on the other grades in the course.<br />
<br />
A15: The mid-term mark could be affected by numerous factors. What if the student was cheating on his/her assignment? What if the difficulty level of the mid term is not equivalent to the other assigned work in the course?<br />
In addition to which, other grades in the course are comprised of a variety of testing methods. Students do not necessarily perform equivalently across all testing methods. For example, John does well on assignments and presentations, but he performs poorly on tests.<br />
<br />
<br />
=='''ASSIGNMENT 3'''==<br />
<br />
===Q1===<br />
As we saw, 'Sector' appears to be an important predictor. Consider models using ses and Sector. Aim to estimate the between Sector gap as a function of ses if there is an interaction between Sector and ses. Check for and provide for a possible contextual effect of ses. Plot expected math achievement in each sector. Plot the gap with SEs. Consider the possibility that the apparently flatter effect of ses in Catholic school could be due to a non-linear effect of ses. How would you test whether this is a reasonable alternative explanation?<br />
<br />
The interaction between sector and ses was significant (p<0.05) and the contextual effect of ses was also significant. This implies that the effect of ses on math achievement differs between public and catholic schools. <br />
<br />
wald( fitn, L.context )<br />
numDF denDF F.value p.value<br />
Contextual effect of ses 1 77 27.47158 <.00001<br />
<br />
<br />
[[File:graph10.jpg]]<br />
<br />
Figure 1: Plot of the gap between ses scores for catholic and public schools<br />
<br />
[[File:graph7.jpg]]<br />
<br />
Figure 2: Plot of expected math achievement in each sector for ses levels of -0.5, 0 and 0.5<br />
<br />
In order to test whether or not the flatter effect of ses in catholic schools could be due to a non-linear effect of ses we would add a quadratic ses term to the model statement and then see if the quadratic term was significant. If the quadratic term was significant, then the flatter effect of ses in catholic schools may be a result of a non-linear effect of ses.<br />
<br />
<br />
===Q2===<br />
<br />
Take the example further by incorporating Sex. Consider the the 'contextual effect' of Sex which is school sex composition. Note that there are three types of schools: Girls, Boys and Coed schools. If you consider an interaction between Sector and school gender composition, you will see that the Public Sector only has Coed schools. What is the consequence of this fact for modelling sex composition and Sector effects. <br />
<br />
<br />
[[File:Q2(1).jpg]]<br />
<br />
<br />
[[File:Q2(3).jpg]]<br />
<br />
<br />
===Q3===<br />
<br />
Does it appear that boys are better off in a boy's school and girls in a girl's school or are they better off in coed schools? How would you qualify your findings so parents don't misinterpret them in making decisions for their children? <br />
<br />
As can be seen in Figure 3, FIgure 4 and the following R output, girls and boys both perform better in single-sex schools. All public schools are co-ed, while catholic schools can be all male, all female or co-ed, so sector may be acting as a confounding factor in the analysis of sex category on math achievement. <br />
<br />
[[File:graph12.jpg]]<br />
<br />
Figure 3: Plot of Math Achievement by Sex Category<br />
<br />
[[File:graph13.jpg]]<br />
<br />
Figure 4: Plot of Math Achievement by ses with respect to sex category<br />
<br />
Call:<br />
lm(formula = mathach ~ ses + factor(Sex.cat) + ses:Sex.cat, data = hs)<br />
<br />
Residuals:<br />
Min 1Q Median 3Q Max <br />
-19.0096 -4.9276 0.1901 4.9705 15.2090 <br />
<br />
Coefficients:<br />
Estimate Std. Error t value Pr(>|t|) <br />
(Intercept) 14.6252 0.4398 33.251 < 2e-16 ***<br />
ses 1.6955 0.6140 2.761 0.005807 ** <br />
factor(Sex.cat)Coed -2.1118 0.4726 -4.469 8.32e-06 ***<br />
factor(Sex.cat)Girls -1.8262 0.5442 -3.355 0.000807 ***<br />
ses:Sex.catCoed 2.0028 0.6545 3.060 0.002242 ** <br />
ses:Sex.catGirls 0.7560 0.7519 1.005 0.314815 <br />
---<br />
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 <br />
<br />
Residual standard error: 6.351 on 1971 degrees of freedom<br />
Multiple R-squared: 0.1396, Adjusted R-squared: 0.1374 <br />
F-statistic: 63.95 on 5 and 1971 DF, p-value: < 2.2e-16 <br />
<br />
<br />
lm(formula = mathach ~ ses + factor(Sex.cat) * factor(Sector), <br />
data = hs)<br />
<br />
Residuals:<br />
Min 1Q Median 3Q Max <br />
-19.4102 -4.8167 0.1175 5.0436 15.3054 <br />
<br />
Coefficients: (2 not defined because of singularities)<br />
Estimate Std. Error t value Pr(>|t|)<br />
(Intercept) 14.9586 0.4175 35.832 < 2e-16<br />
ses 3.1046 0.1952 15.907 < 2e-16<br />
factor(Sex.cat)Coed -1.5700 0.5090 -3.084 0.002068<br />
factor(Sex.cat)Girls -2.1432 0.5257 -4.077 4.74e-05<br />
factor(Sector)Public -1.4099 0.3634 -3.879 0.000108<br />
factor(Sex.cat)Coed:factor(Sector)Public NA NA NA NA<br />
factor(Sex.cat)Girls:factor(Sector)Public NA NA NA NA<br />
<br />
(Intercept) ***<br />
ses ***<br />
factor(Sex.cat)Coed ** <br />
factor(Sex.cat)Girls ***<br />
factor(Sector)Public ***<br />
factor(Sex.cat)Coed:factor(Sector)Public <br />
factor(Sex.cat)Girls:factor(Sector)Public <br />
---<br />
<br />
===Q4===<br />
<br />
*Is a low ses child better off in a high ses school or are they better off in a school of a similar ses? How about a high ses child in a low ses school? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?<br />
<br />
*<br />
<br />
[[File:22.jpeg]]<br />
<br />
[[File:33.jpeg]]<br />
<br />
<br />
===Q5===<br />
<br />
Is a minority status child better off in a school with a higher proportion of minority status children or are they better off in a school with a low proportion? How would you qualify your findings so parents don't misinterpret them in making decisions for their children? <br />
<br />
<br />
<br />
[[File:graph2.jpg]]<br />
<br />
<br />
[[File:graph3.jpg]]<br />
<br />
<br />
According to the first graphs we can see that it is better to send a child with low SES to a school with low proportion of minority status but in the other hand, it is better to send a child with higher SES to a school with high proportion of minority status. It is also the same for a child with low Mathach <br />
<br />
<br />
[[File:graph1.jpg]]<br />
<br />
This graph is very useful to understand more the proportion of minority status because we can compare the new predictor Mathach versus SES according to their minority and majority gap together, so with this graph parents have to be careful about taking a decision for their children because this graph is different from the others due to the introduce of a new predictor carrying on a new analysis of contextual effects.</div>Jyjlihttp://scs.math.yorku.ca/index.php/File:33.jpegFile:33.jpeg2011-03-16T03:31:30Z<p>Jyjli: </p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/File:22.jpegFile:22.jpeg2011-03-16T03:29:56Z<p>Jyjli: </p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/File:4.jpegFile:4.jpeg2011-03-16T03:25:22Z<p>Jyjli: uploaded a new version of &quot;File:4.jpeg&quot;</p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/File:4.jpegFile:4.jpeg2011-03-16T03:23:32Z<p>Jyjli: uploaded a new version of &quot;File:4.jpeg&quot;: Reverted to version as of 02:08, 16 March 2011</p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/File:4.jpegFile:4.jpeg2011-03-16T03:23:02Z<p>Jyjli: uploaded a new version of &quot;File:4.jpeg&quot;</p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/RubinMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Rubin2011-03-16T02:08:37Z<p>Jyjli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Donald_Rubin<br />
=='''SIMPSON'S PARADOX'''==<br />
<br />
<br />
==='''DEFINITION'''===<br />
<br />
* An apparent paradox in which the association between two variables (X and Y) changes when a third variable (Z) is taken into account.<br />
<br />
* Z is called a confounding factor.<br />
<br />
<br />
==='''CONFOUNDING FACTOR'''===<br />
<br />
A confounding factor is associated with both the outcome variable and the primary risk factor (or independent variable) of interest.<br />
<br />
<br />
[[File:SP1.jpg]]<br />
<br />
<br />
==='''EXAMPLE I: UNIVERSITY ADMISSION'''===<br />
<br />
University of California, Berkeley was sued for biased acceptance rates with respect to gender, men applying to graduate school were more likely to be accepted than women.<br />
<br />
<br />
[[File:sp2.jpg]]<br />
<br />
<br />
However, when the male to female acceptance rates were considered for each individual department there was actually a slight favourable bias toward the acceptance of women.<br />
<br />
<br />
[[File:sp3.jpg]]<br />
<br />
<br />
[[File:sp4.jpg]]<br />
<br />
<br />
==='''EXAMPLE II: MALARIA INFECTION'''===<br />
<br />
Male gender as a risk factor for malaria.<br />
<br />
<br />
[[File:sp5.jpg]]<br />
<br />
<br />
Odds Ratio = 1.7 (P<0.05).<br />
<br />
Confouder vs exposure<br />
<br />
<br />
[[File:sp6.jpg]]<br />
<br />
<br />
Odds Ratio = 7.8<br />
<br />
Confounder vs outcome<br />
<br />
<br />
[[File:sp7.jpg]]<br />
<br />
<br />
Odds Ratio = 5.3<br />
<br />
<br />
[[File:sp8.jpg]]<br />
<br />
<br />
Odds Ratio Outdoor Occupation = 1.06<br />
Odds Ratio Indoor Occupation = 1.00<br />
<br />
<br />
[[File:sp9.jpg]]<br />
<br />
<br />
=='''LATTICE'''==<br />
<br />
[[Media:LATTICE.pdf.pdf]]<br />
<br />
<br />
=='''ASSIGNMENT 2'''==<br />
<br />
'''Pulse Article: For our own good''' [http://www.math.yorku.ca/~georges/Courses/2565/StatisticsInTheNews030926.html].<br />
<br />
<br />
==='''Q1'''===<br />
<br />
Whether the article suggest a causal relationship between two variables. If so which? Are the data observational or experimental?<br />
<br />
<br />
'''DISCUSSION''' <br />
<br />
Yes, this article suggests a causal relationship between the type of advertising and unnecesary prescriptions. This paper also pointed out a big emphasis on the fact that drug companies spend too much money on direct-to-consumer advertising of prescription medicines which may lead to health care costs associated with unnecessary prescriptions. The data should be observational because the paper doesn't mention that they chose people that advertising help or not to avoid unnecessary prescriptions. <br />
<br />
<br />
==='''Q2'''===<br />
<br />
Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
I think that there could be a mediating factor leading to unnecessary health care costs. This mediating factor could be the secondary reactions of the drug. In some cases the secondary reactions are so important because people can spend a lot of money as a result of drug complications. I agree with the cause-reaction of this article because they don't believe what the president of Pfizer said regarding the unspoken truth about advertising constituting one of the largest and most successful public health campaigns in US history. Also, a confounding factor may exist which leads people to required additional prescriptions. For example, gender, in some cases secondary reactions are different between men and women. <br />
<br />
<br />
==='''Q3'''=== <br />
<br />
Have any confounding factors been accounted for in the analysis?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
No, confounding factors were not accounted for. They emphasize the type of advertising. <br />
<br />
<br />
==='''Q4'''===<br />
<br />
Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
The article does not mention anything about controlling mediating factor. Also, they did not do a causal interpretation. There was no mention of a relationship between the type of drug and the case group. <br />
<br />
<br />
==='''Q5'''===<br />
<br />
What is your personal assessment of the evidence for causality in the study that is the subject of the article?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
When I read the article I thought that it would be more interesting but I was wrong due to the lack of information. They don't support their claims. In other words, they don't proof what they are writing, also, they only give a very general point of view. Instead of understanding what they are writing, they only give us too many questions. <br />
<br />
Laura: I found the topic of the article quite interesting, but like Luis, felt the article was lacking in substantive information. <br />
<br />
<br />
===QUESTIONS 2 PART===<br />
<br />
Q3: A survey of students at York reveals that the average class size of the classes they attend is 130. A survey of faculty shows an average class size of 30. The students must be exaggerating their class sizes or the faculty under-reporting.<br />
<br />
A3: This could be an instance of sampling bias. Maybe all of the students surveyed were coming from the same 130 person class, while the professors surveyed were all coming from smaller classes. Could be the difference between surveying 1st and 2nd year courses vs. 3rd year courses, etc.<br />
<br />
Q6: If smoking really is bad for your health, you expect a comparison of a group of people who have quit smoking with a group that have continued to reveal that the group quitting is, on average, healthier than the group that continued.<br />
<br />
A6: Maybe the people who quit are quitting for health reasons (i.e. heavy smokers tend to quit while light smokers tend to continue smoking). Also, when you stop smoking your stress level may go up, which would decrease your overall level of health.<br />
<br />
Q9: If you want to reduce the number of predictor variables in a model, a technique like forward stepwise regression will generally do a good job of identifying which variables you should keep.<br />
<br />
A9: For a regression model with n possible predictor variables, the first step involves evaluating n predictor variable subsets, each consisting of a single predictor variable, and selecting the one with the highest evaluation criterion. The next step selects from among n-1 subsets, the next step from n-2 subsets, and so on. It is not guaranteed to find the subset with the highest evaluation criterion.<br />
Stepwise selection models also ignore the problem of multiple inference (performing more than one statistical inference procedure on the same data set), which can lead to erroneous conclusions. Should not rely solely on automated procedures, it may make sense from a logical perspective to include variables irrespective of their significance (i.e. confounders).<br />
<br />
Q12: In general we don’t need to worry about interactions between variables unless there is a correlation between them.<br />
<br />
A12: Correlation measures the strength of the linear relationship between quantitative variables, relationship may not be linear. Also, correlation assumes variables are normally distributed, which is not always the case. <br />
<br />
Q15: Statistical theory shows that the best way to impute a mid-term grade for a student who missed the test with a valid excuse is to use the predicted mid-term grade based on the other grades in the course.<br />
<br />
A15: The mid-term mark could be affected by numerous factors. What if the student was cheating on his/her assignment? What if the difficulty level of the mid term is not equivalent to the other assigned work in the course?<br />
In addition to which, other grades in the course are comprised of a variety of testing methods. Students do not necessarily perform equivalently across all testing methods. For example, John does well on assignments and presentations, but he performs poorly on tests.<br />
<br />
<br />
=='''ASSIGNMENT 3'''==<br />
<br />
===Q1===<br />
As we saw, 'Sector' appears to be an important predictor. Consider models using ses and Sector. Aim to estimate the between Sector gap as a function of ses if there is an interaction between Sector and ses. Check for and provide for a possible contextual effect of ses. Plot expected math achievement in each sector. Plot the gap with SEs. Consider the possibility that the apparently flatter effect of ses in Catholic school could be due to a non-linear effect of ses. How would you test whether this is a reasonable alternative explanation?<br />
<br />
The interaction between sector and ses was significant (p<0.05) and the contextual effect of ses was also significant. This implies that the effect of ses on math achievement differs between public and catholic schools. <br />
<br />
wald( fitn, L.context )<br />
numDF denDF F.value p.value<br />
Contextual effect of ses 1 77 27.47158 <.00001<br />
<br />
<br />
[[File:graph10.jpg]]<br />
<br />
Figure 1: Plot of the gap between ses scores for catholic and public schools<br />
<br />
[[File:graph7.jpg]]<br />
<br />
Figure 2: Plot of expected math achievement in each sector for ses levels of -0.5, 0 and 0.5<br />
<br />
In order to test whether or not the flatter effect of ses in catholic schools could be due to a non-linear effect of ses we would add a quadratic ses term to the model statement and then see if the quadratic term was significant. If the quadratic term was significant, then the flatter effect of ses in catholic schools may be a result of a non-linear effect of ses.<br />
<br />
<br />
===Q2===<br />
<br />
Take the example further by incorporating Sex. Consider the the 'contextual effect' of Sex which is school sex composition. Note that there are three types of schools: Girls, Boys and Coed schools. If you consider an interaction between Sector and school gender composition, you will see that the Public Sector only has Coed schools. What is the consequence of this fact for modelling sex composition and Sector effects. <br />
<br />
<br />
[[File:Q2(1).jpg]]<br />
<br />
<br />
[[File:Q2(3).jpg]]<br />
<br />
<br />
===Q3===<br />
<br />
Does it appear that boys are better off in a boy's school and girls in a girl's school or are they better off in coed schools? How would you qualify your findings so parents don't misinterpret them in making decisions for their children? <br />
<br />
<br />
<br />
===Q4===<br />
<br />
*Is a low ses child better off in a high ses school or are they better off in a school of a similar ses? How about a high ses child in a low ses school? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?<br />
<br />
*<br />
<br />
[[File:1.jpeg]]<br />
<br />
[[File:2.jpeg]]<br />
<br />
[[File:3.jpeg]]<br />
<br />
[[File:4.jpeg]]<br />
<br />
<br />
===Q5===<br />
<br />
Is a minority status child better off in a school with a higher proportion of minority status children or are they better off in a school with a low proportion? How would you qualify your findings so parents don't misinterpret them in making decisions for their children? <br />
<br />
<br />
<br />
[[File:graph2.jpg]]<br />
<br />
<br />
[[File:graph3.jpg]]<br />
<br />
<br />
According to the first graphs we can see that it is better to send a child with low SES to a school with low proportion of minority status but in the other hand, it is better to send a child with higher SES to a school with high proportion of minority status. It is also the same for a child with low Mathach <br />
<br />
<br />
[[File:graph1.jpg]]<br />
<br />
This graph is very useful to understand more the proportion of minority status because we can compare the new predictor Mathach versus SES according to their minority and majority gap together, so with this graph parents have to be careful about taking a decision for their children because this graph is different from the others due to the introduce of a new predictor carrying on a new analysis of contextual effects.</div>Jyjlihttp://scs.math.yorku.ca/index.php/File:4.jpegFile:4.jpeg2011-03-16T02:08:08Z<p>Jyjli: </p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/File:3.jpegFile:3.jpeg2011-03-16T02:07:53Z<p>Jyjli: </p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/File:2.jpegFile:2.jpeg2011-03-16T02:06:36Z<p>Jyjli: </p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/RubinMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Rubin2011-03-16T02:05:57Z<p>Jyjli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Donald_Rubin<br />
=='''SIMPSON'S PARADOX'''==<br />
<br />
<br />
==='''DEFINITION'''===<br />
<br />
* An apparent paradox in which the association between two variables (X and Y) changes when a third variable (Z) is taken into account.<br />
<br />
* Z is called a confounding factor.<br />
<br />
<br />
==='''CONFOUNDING FACTOR'''===<br />
<br />
A confounding factor is associated with both the outcome variable and the primary risk factor (or independent variable) of interest.<br />
<br />
<br />
[[File:SP1.jpg]]<br />
<br />
<br />
==='''EXAMPLE I: UNIVERSITY ADMISSION'''===<br />
<br />
University of California, Berkeley was sued for biased acceptance rates with respect to gender, men applying to graduate school were more likely to be accepted than women.<br />
<br />
<br />
[[File:sp2.jpg]]<br />
<br />
<br />
However, when the male to female acceptance rates were considered for each individual department there was actually a slight favourable bias toward the acceptance of women.<br />
<br />
<br />
[[File:sp3.jpg]]<br />
<br />
<br />
[[File:sp4.jpg]]<br />
<br />
<br />
==='''EXAMPLE II: MALARIA INFECTION'''===<br />
<br />
Male gender as a risk factor for malaria.<br />
<br />
<br />
[[File:sp5.jpg]]<br />
<br />
<br />
Odds Ratio = 1.7 (P<0.05).<br />
<br />
Confouder vs exposure<br />
<br />
<br />
[[File:sp6.jpg]]<br />
<br />
<br />
Odds Ratio = 7.8<br />
<br />
Confounder vs outcome<br />
<br />
<br />
[[File:sp7.jpg]]<br />
<br />
<br />
Odds Ratio = 5.3<br />
<br />
<br />
[[File:sp8.jpg]]<br />
<br />
<br />
Odds Ratio Outdoor Occupation = 1.06<br />
Odds Ratio Indoor Occupation = 1.00<br />
<br />
<br />
[[File:sp9.jpg]]<br />
<br />
<br />
=='''LATTICE'''==<br />
<br />
[[Media:LATTICE.pdf.pdf]]<br />
<br />
<br />
=='''ASSIGNMENT 2'''==<br />
<br />
'''Pulse Article: For our own good''' [http://www.math.yorku.ca/~georges/Courses/2565/StatisticsInTheNews030926.html].<br />
<br />
<br />
==='''Q1'''===<br />
<br />
Whether the article suggest a causal relationship between two variables. If so which? Are the data observational or experimental?<br />
<br />
<br />
'''DISCUSSION''' <br />
<br />
Yes, this article suggests a causal relationship between the type of advertising and unnecesary prescriptions. This paper also pointed out a big emphasis on the fact that drug companies spend too much money on direct-to-consumer advertising of prescription medicines which may lead to health care costs associated with unnecessary prescriptions. The data should be observational because the paper doesn't mention that they chose people that advertising help or not to avoid unnecessary prescriptions. <br />
<br />
<br />
==='''Q2'''===<br />
<br />
Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
I think that there could be a mediating factor leading to unnecessary health care costs. This mediating factor could be the secondary reactions of the drug. In some cases the secondary reactions are so important because people can spend a lot of money as a result of drug complications. I agree with the cause-reaction of this article because they don't believe what the president of Pfizer said regarding the unspoken truth about advertising constituting one of the largest and most successful public health campaigns in US history. Also, a confounding factor may exist which leads people to required additional prescriptions. For example, gender, in some cases secondary reactions are different between men and women. <br />
<br />
<br />
==='''Q3'''=== <br />
<br />
Have any confounding factors been accounted for in the analysis?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
No, confounding factors were not accounted for. They emphasize the type of advertising. <br />
<br />
<br />
==='''Q4'''===<br />
<br />
Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
The article does not mention anything about controlling mediating factor. Also, they did not do a causal interpretation. There was no mention of a relationship between the type of drug and the case group. <br />
<br />
<br />
==='''Q5'''===<br />
<br />
What is your personal assessment of the evidence for causality in the study that is the subject of the article?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
When I read the article I thought that it would be more interesting but I was wrong due to the lack of information. They don't support their claims. In other words, they don't proof what they are writing, also, they only give a very general point of view. Instead of understanding what they are writing, they only give us too many questions. <br />
<br />
Laura: I found the topic of the article quite interesting, but like Luis, felt the article was lacking in substantive information. <br />
<br />
<br />
===QUESTIONS 2 PART===<br />
<br />
Q3: A survey of students at York reveals that the average class size of the classes they attend is 130. A survey of faculty shows an average class size of 30. The students must be exaggerating their class sizes or the faculty under-reporting.<br />
<br />
A3: This could be an instance of sampling bias. Maybe all of the students surveyed were coming from the same 130 person class, while the professors surveyed were all coming from smaller classes. Could be the difference between surveying 1st and 2nd year courses vs. 3rd year courses, etc.<br />
<br />
Q6: If smoking really is bad for your health, you expect a comparison of a group of people who have quit smoking with a group that have continued to reveal that the group quitting is, on average, healthier than the group that continued.<br />
<br />
A6: Maybe the people who quit are quitting for health reasons (i.e. heavy smokers tend to quit while light smokers tend to continue smoking). Also, when you stop smoking your stress level may go up, which would decrease your overall level of health.<br />
<br />
Q9: If you want to reduce the number of predictor variables in a model, a technique like forward stepwise regression will generally do a good job of identifying which variables you should keep.<br />
<br />
A9: For a regression model with n possible predictor variables, the first step involves evaluating n predictor variable subsets, each consisting of a single predictor variable, and selecting the one with the highest evaluation criterion. The next step selects from among n-1 subsets, the next step from n-2 subsets, and so on. It is not guaranteed to find the subset with the highest evaluation criterion.<br />
Stepwise selection models also ignore the problem of multiple inference (performing more than one statistical inference procedure on the same data set), which can lead to erroneous conclusions. Should not rely solely on automated procedures, it may make sense from a logical perspective to include variables irrespective of their significance (i.e. confounders).<br />
<br />
Q12: In general we don’t need to worry about interactions between variables unless there is a correlation between them.<br />
<br />
A12: Correlation measures the strength of the linear relationship between quantitative variables, relationship may not be linear. Also, correlation assumes variables are normally distributed, which is not always the case. <br />
<br />
Q15: Statistical theory shows that the best way to impute a mid-term grade for a student who missed the test with a valid excuse is to use the predicted mid-term grade based on the other grades in the course.<br />
<br />
A15: The mid-term mark could be affected by numerous factors. What if the student was cheating on his/her assignment? What if the difficulty level of the mid term is not equivalent to the other assigned work in the course?<br />
In addition to which, other grades in the course are comprised of a variety of testing methods. Students do not necessarily perform equivalently across all testing methods. For example, John does well on assignments and presentations, but he performs poorly on tests.<br />
<br />
<br />
=='''ASSIGNMENT 3'''==<br />
<br />
===Q1===<br />
As we saw, 'Sector' appears to be an important predictor. Consider models using ses and Sector. Aim to estimate the between Sector gap as a function of ses if there is an interaction between Sector and ses. Check for and provide for a possible contextual effect of ses. Plot expected math achievement in each sector. Plot the gap with SEs. Consider the possibility that the apparently flatter effect of ses in Catholic school could be due to a non-linear effect of ses. How would you test whether this is a reasonable alternative explanation?<br />
<br />
The interaction between sector and ses was significant (p<0.05) and the contextual effect of ses was also significant. This implies that the effect of ses on math achievement differs between public and catholic schools. <br />
<br />
wald( fitn, L.context )<br />
numDF denDF F.value p.value<br />
Contextual effect of ses 1 77 27.47158 <.00001<br />
<br />
<br />
[[File:graph10.jpg]]<br />
<br />
Figure 1: Plot of the gap between ses scores for catholic and public schools<br />
<br />
[[File:graph7.jpg]]<br />
<br />
Figure 2: Plot of expected math achievement in each sector for ses levels of -0.5, 0 and 0.5<br />
<br />
In order to test whether or not the flatter effect of ses in catholic schools could be due to a non-linear effect of ses we would add a quadratic ses term to the model statement and then see if the quadratic term was significant. If the quadratic term was significant, then the flatter effect of ses in catholic schools may be a result of a non-linear effect of ses.<br />
<br />
<br />
===Q2===<br />
<br />
Take the example further by incorporating Sex. Consider the the 'contextual effect' of Sex which is school sex composition. Note that there are three types of schools: Girls, Boys and Coed schools. If you consider an interaction between Sector and school gender composition, you will see that the Public Sector only has Coed schools. What is the consequence of this fact for modelling sex composition and Sector effects. <br />
<br />
<br />
[[File:Q2(1).jpg]]<br />
<br />
<br />
[[File:Q2(3).jpg]]<br />
<br />
<br />
===Q3===<br />
<br />
Does it appear that boys are better off in a boy's school and girls in a girl's school or are they better off in coed schools? How would you qualify your findings so parents don't misinterpret them in making decisions for their children? <br />
<br />
<br />
<br />
===Q4===<br />
<br />
*Is a low ses child better off in a high ses school or are they better off in a school of a similar ses? How about a high ses child in a low ses school? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?<br />
<br />
*<br />
<br />
[[File:1.jpeg]]<br />
<br />
<br />
<br />
<br />
===Q5===<br />
<br />
Is a minority status child better off in a school with a higher proportion of minority status children or are they better off in a school with a low proportion? How would you qualify your findings so parents don't misinterpret them in making decisions for their children? <br />
<br />
<br />
<br />
[[File:graph2.jpg]]<br />
<br />
<br />
[[File:graph3.jpg]]<br />
<br />
<br />
According to the first graphs we can see that it is better to send a child with low SES to a school with low proportion of minority status but in the other hand, it is better to send a child with higher SES to a school with high proportion of minority status. It is also the same for a child with low Mathach <br />
<br />
<br />
[[File:graph1.jpg]]<br />
<br />
This graph is very useful to understand more the proportion of minority status because we can compare the new predictor Mathach versus SES according to their minority and majority gap together, so with this graph parents have to be careful about taking a decision for their children because this graph is different from the others due to the introduce of a new predictor carrying on a new analysis of contextual effects.</div>Jyjlihttp://scs.math.yorku.ca/index.php/File:1.jpegFile:1.jpeg2011-03-16T02:03:14Z<p>Jyjli: </p>
<hr />
<div></div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Assignment_Teams/RubinMATH 6627 2010-11 Practicum in Statistical Consulting/Assignment Teams/Rubin2011-03-16T02:00:57Z<p>Jyjli: </p>
<hr />
<div>* http://en.wikipedia.org/wiki/Donald_Rubin<br />
=='''SIMPSON'S PARADOX'''==<br />
<br />
<br />
==='''DEFINITION'''===<br />
<br />
* An apparent paradox in which the association between two variables (X and Y) changes when a third variable (Z) is taken into account.<br />
<br />
* Z is called a confounding factor.<br />
<br />
<br />
==='''CONFOUNDING FACTOR'''===<br />
<br />
A confounding factor is associated with both the outcome variable and the primary risk factor (or independent variable) of interest.<br />
<br />
<br />
[[File:SP1.jpg]]<br />
<br />
<br />
==='''EXAMPLE I: UNIVERSITY ADMISSION'''===<br />
<br />
University of California, Berkeley was sued for biased acceptance rates with respect to gender, men applying to graduate school were more likely to be accepted than women.<br />
<br />
<br />
[[File:sp2.jpg]]<br />
<br />
<br />
However, when the male to female acceptance rates were considered for each individual department there was actually a slight favourable bias toward the acceptance of women.<br />
<br />
<br />
[[File:sp3.jpg]]<br />
<br />
<br />
[[File:sp4.jpg]]<br />
<br />
<br />
==='''EXAMPLE II: MALARIA INFECTION'''===<br />
<br />
Male gender as a risk factor for malaria.<br />
<br />
<br />
[[File:sp5.jpg]]<br />
<br />
<br />
Odds Ratio = 1.7 (P<0.05).<br />
<br />
Confouder vs exposure<br />
<br />
<br />
[[File:sp6.jpg]]<br />
<br />
<br />
Odds Ratio = 7.8<br />
<br />
Confounder vs outcome<br />
<br />
<br />
[[File:sp7.jpg]]<br />
<br />
<br />
Odds Ratio = 5.3<br />
<br />
<br />
[[File:sp8.jpg]]<br />
<br />
<br />
Odds Ratio Outdoor Occupation = 1.06<br />
Odds Ratio Indoor Occupation = 1.00<br />
<br />
<br />
[[File:sp9.jpg]]<br />
<br />
<br />
=='''LATTICE'''==<br />
<br />
[[Media:LATTICE.pdf.pdf]]<br />
<br />
<br />
=='''ASSIGNMENT 2'''==<br />
<br />
'''Pulse Article: For our own good''' [http://www.math.yorku.ca/~georges/Courses/2565/StatisticsInTheNews030926.html].<br />
<br />
<br />
==='''Q1'''===<br />
<br />
Whether the article suggest a causal relationship between two variables. If so which? Are the data observational or experimental?<br />
<br />
<br />
'''DISCUSSION''' <br />
<br />
Yes, this article suggests a causal relationship between the type of advertising and unnecesary prescriptions. This paper also pointed out a big emphasis on the fact that drug companies spend too much money on direct-to-consumer advertising of prescription medicines which may lead to health care costs associated with unnecessary prescriptions. The data should be observational because the paper doesn't mention that they chose people that advertising help or not to avoid unnecessary prescriptions. <br />
<br />
<br />
==='''Q2'''===<br />
<br />
Can you think of alternative explanations to causality? Confounding factors? Or explanations consistent with causality? Mediating factors?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
I think that there could be a mediating factor leading to unnecessary health care costs. This mediating factor could be the secondary reactions of the drug. In some cases the secondary reactions are so important because people can spend a lot of money as a result of drug complications. I agree with the cause-reaction of this article because they don't believe what the president of Pfizer said regarding the unspoken truth about advertising constituting one of the largest and most successful public health campaigns in US history. Also, a confounding factor may exist which leads people to required additional prescriptions. For example, gender, in some cases secondary reactions are different between men and women. <br />
<br />
<br />
==='''Q3'''=== <br />
<br />
Have any confounding factors been accounted for in the analysis?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
No, confounding factors were not accounted for. They emphasize the type of advertising. <br />
<br />
<br />
==='''Q4'''===<br />
<br />
Have any mediating factors been controlled for in a way that vitiates a causal interpretation of the relationship?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
The article does not mention anything about controlling mediating factor. Also, they did not do a causal interpretation. There was no mention of a relationship between the type of drug and the case group. <br />
<br />
<br />
==='''Q5'''===<br />
<br />
What is your personal assessment of the evidence for causality in the study that is the subject of the article?<br />
<br />
<br />
'''DISCUSSION'''<br />
<br />
<br />
When I read the article I thought that it would be more interesting but I was wrong due to the lack of information. They don't support their claims. In other words, they don't proof what they are writing, also, they only give a very general point of view. Instead of understanding what they are writing, they only give us too many questions. <br />
<br />
Laura: I found the topic of the article quite interesting, but like Luis, felt the article was lacking in substantive information. <br />
<br />
<br />
===QUESTIONS 2 PART===<br />
<br />
Q3: A survey of students at York reveals that the average class size of the classes they attend is 130. A survey of faculty shows an average class size of 30. The students must be exaggerating their class sizes or the faculty under-reporting.<br />
<br />
A3: This could be an instance of sampling bias. Maybe all of the students surveyed were coming from the same 130 person class, while the professors surveyed were all coming from smaller classes. Could be the difference between surveying 1st and 2nd year courses vs. 3rd year courses, etc.<br />
<br />
Q6: If smoking really is bad for your health, you expect a comparison of a group of people who have quit smoking with a group that have continued to reveal that the group quitting is, on average, healthier than the group that continued.<br />
<br />
A6: Maybe the people who quit are quitting for health reasons (i.e. heavy smokers tend to quit while light smokers tend to continue smoking). Also, when you stop smoking your stress level may go up, which would decrease your overall level of health.<br />
<br />
Q9: If you want to reduce the number of predictor variables in a model, a technique like forward stepwise regression will generally do a good job of identifying which variables you should keep.<br />
<br />
A9: For a regression model with n possible predictor variables, the first step involves evaluating n predictor variable subsets, each consisting of a single predictor variable, and selecting the one with the highest evaluation criterion. The next step selects from among n-1 subsets, the next step from n-2 subsets, and so on. It is not guaranteed to find the subset with the highest evaluation criterion.<br />
Stepwise selection models also ignore the problem of multiple inference (performing more than one statistical inference procedure on the same data set), which can lead to erroneous conclusions. Should not rely solely on automated procedures, it may make sense from a logical perspective to include variables irrespective of their significance (i.e. confounders).<br />
<br />
Q12: In general we don’t need to worry about interactions between variables unless there is a correlation between them.<br />
<br />
A12: Correlation measures the strength of the linear relationship between quantitative variables, relationship may not be linear. Also, correlation assumes variables are normally distributed, which is not always the case. <br />
<br />
Q15: Statistical theory shows that the best way to impute a mid-term grade for a student who missed the test with a valid excuse is to use the predicted mid-term grade based on the other grades in the course.<br />
<br />
A15: The mid-term mark could be affected by numerous factors. What if the student was cheating on his/her assignment? What if the difficulty level of the mid term is not equivalent to the other assigned work in the course?<br />
In addition to which, other grades in the course are comprised of a variety of testing methods. Students do not necessarily perform equivalently across all testing methods. For example, John does well on assignments and presentations, but he performs poorly on tests.<br />
<br />
<br />
=='''ASSIGNMENT 3'''==<br />
<br />
===Q1===<br />
As we saw, 'Sector' appears to be an important predictor. Consider models using ses and Sector. Aim to estimate the between Sector gap as a function of ses if there is an interaction between Sector and ses. Check for and provide for a possible contextual effect of ses. Plot expected math achievement in each sector. Plot the gap with SEs. Consider the possibility that the apparently flatter effect of ses in Catholic school could be due to a non-linear effect of ses. How would you test whether this is a reasonable alternative explanation?<br />
<br />
The interaction between sector and ses was significant (p<0.05) and the contextual effect of ses was also significant. This implies that the effect of ses on math achievement differs between public and catholic schools. <br />
<br />
wald( fitn, L.context )<br />
numDF denDF F.value p.value<br />
Contextual effect of ses 1 77 27.47158 <.00001<br />
<br />
<br />
[[File:graph10.jpg]]<br />
<br />
Figure 1: Plot of the gap between ses scores for catholic and public schools<br />
<br />
[[File:graph7.jpg]]<br />
<br />
Figure 2: Plot of expected math achievement in each sector for ses levels of -0.5, 0 and 0.5<br />
<br />
In order to test whether or not the flatter effect of ses in catholic schools could be due to a non-linear effect of ses we would add a quadratic ses term to the model statement and then see if the quadratic term was significant. If the quadratic term was significant, then the flatter effect of ses in catholic schools may be a result of a non-linear effect of ses.<br />
<br />
<br />
===Q2===<br />
<br />
Take the example further by incorporating Sex. Consider the the 'contextual effect' of Sex which is school sex composition. Note that there are three types of schools: Girls, Boys and Coed schools. If you consider an interaction between Sector and school gender composition, you will see that the Public Sector only has Coed schools. What is the consequence of this fact for modelling sex composition and Sector effects. <br />
<br />
<br />
[[File:Q2(1).jpg]]<br />
<br />
<br />
[[File:Q2(3).jpg]]<br />
<br />
<br />
===Q3===<br />
<br />
Does it appear that boys are better off in a boy's school and girls in a girl's school or are they better off in coed schools? How would you qualify your findings so parents don't misinterpret them in making decisions for their children? <br />
<br />
<br />
<br />
===Q4===<br />
<br />
*Is a low ses child better off in a high ses school or are they better off in a school of a similar ses? How about a high ses child in a low ses school? How would you qualify your findings so parents don't misinterpret them in making decisions for their children?<br />
<br />
*<br />
<br />
<br />
===Q5===<br />
<br />
Is a minority status child better off in a school with a higher proportion of minority status children or are they better off in a school with a low proportion? How would you qualify your findings so parents don't misinterpret them in making decisions for their children? <br />
<br />
<br />
<br />
[[File:graph2.jpg]]<br />
<br />
<br />
[[File:graph3.jpg]]<br />
<br />
<br />
According to the first graphs we can see that it is better to send a child with low SES to a school with low proportion of minority status but in the other hand, it is better to send a child with higher SES to a school with high proportion of minority status. It is also the same for a child with low Mathach <br />
<br />
<br />
[[File:graph1.jpg]]<br />
<br />
This graph is very useful to understand more the proportion of minority status because we can compare the new predictor Mathach versus SES according to their minority and majority gap together, so with this graph parents have to be careful about taking a decision for their children because this graph is different from the others due to the introduce of a new predictor carrying on a new analysis of contextual effects.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-12T04:08:20Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the signi®cance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of<br />
explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]<br />
*Official Statistics and Statistical Ethics : Selected Issues [http://pdfcast.org/pdf/official-statistics-and-statistical-ethics-selected-issues]</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-12T04:00:39Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the signi®cance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of<br />
explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.<br />
<br />
===Week 8===<br />
*TED Talks on Statistics <br />
*[http://www.kdnuggets.com/2010/11/ted-talks-lies-damned-lies-and-statistics.html|Sebastian Wernicke:Lies, damned lies and statistics]<br />
*[http://www.ted.com/talks/chris_jordan_pictures_some_shocking_stats.html|Chris Jordan pictures some shocking stats]</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-12T03:49:26Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
===Week 8===<br />
*Q: Briefly introduce tha Wald test, and what is the problem of the wald test?<br />
*A:The Wald test is a way of testing the signi®cance of particular explanatory variables in a statistical model. In logistic regression we have a binary outcome variable and one or more explanatory variables. For each explanatory variable in the model there will be an associated parameter. It is one of a number of ways of testing whether the parameters associated with a group of<br />
explanatory variables are zero. If for a particular explanatory variable, or group of explanatory variables, the Wald test is significant, then we would conclude that the parameters associated with these variables are not zero, so that the variables should be included in the model. If the Wald test is not significant then these explanatory variables can be omitted from the model. However, the problems with the use of the Wald statistic: for large coefficients, standard error is inflated, lowering the Wald statistic (chi-square) value, and the likelihood-ratio test is more reliable for small sample sizes than the Wald test.<br />
<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-12T03:22:33Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
In year 2011 the survey of USRC Hotel Investment released the prediction of a good growth in the hotel industry. Based on the data in the context of longer-term trends. The data includes parameters such as capitalization rates, discount rates, income and expense growth expectations, marketing time, debt parameters, and other data for both full-service and limited-service hotels. The researches found that full-service discount rates are now at their lowest level since the Mid Year 2007 survey, just prior to rates beginning to creep up in 2008, and of course the subsequent economic and credit collapse of the Fall of 2008. However, they are now just 10 basis points higher than the record low seen in the Winter of 2007 survey, and are hovering very near the survey results of the last recovery years of 2005-2006, following the post 9-11 downturn. As yield requirements lessened, anticipated growth continued to increase, even beyond the very healthy movement we saw in the last Mid Year 2010 Survey. <br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-12T03:09:15Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
Hi Jessica, I'm not sure why, but the link you have provided for the tennis article isn't working. I searched for the article in google and found it at the following link in the event somebody else is interested in viewing the article. --[[User:Lawarren|Lawarren]] 10:48, 9 March 2011 (EST)<br />
<br />
http://www.sciencedaily.com/releases/2011/03/110301091225.htm<br />
* thank you very much Lawarren.<br />
<br />
===Week 8===<br />
[http://www.traveldailynews.com/pages/show_page/42076-Hotel-capitalization-rates-fall-further|Hotel capitalization rates fall further]<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-05T05:53:37Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test? <br />
*The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-05T05:52:25Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q: A comparison of the Scheffé method/Confidence Interval and the Bonferroni method/Confidence Interval.<br />
*A:The Bonferroni adjustment applies very generally, but it suffers from the limitation that the number of simultaneous confidence intervals must be specified in advance in order to maintain the validity of the simultaneous coverage. The Scheffe method is an alternative that controls the simultaneous coverage over all contrasts in a subspace. If a large number of contrasts are of interest, or it is not known in advance which ones are of interest, the Scheffe method provides a way to “snoop” through all the possibilities while maintaining control on the coverage of confidence intervals and the false positive rates of tests. when the number of contrasts to be estimated is small, Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*How does the Scheffe method connect with the F-test?<br />
The F-test for the model is significant if and only if some contrast in the model space has a significant Scheffe confidence interval. If the F-test is not significant, then none of the Scheffe confidence intervals will be significant.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-05T05:29:11Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q:<br />
<br />
*A:<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*The comparison of the Scheffé Confidence Interval and the Bonferroni Confidence Interval: when the number of contrasts to be estimated is small, (about as many as there are factors) Bonferroni is better than Scheffé. Actually, unless the number of desired contrasts is at least twice the number of factors, Scheffé will always show wider confidence bands than Bonferroni.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-03-05T05:08:24Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
===Week 7===<br />
*Q:<br />
<br />
*A:<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
===Week 7===<br />
[http://www.sciencedaily.com/releases/2011/03/110301091225.htm|Who's the Best Tennis Player of All Time?]<br />
<br />
Ranking tennis players is a novel way to show how complex network analysis can reveal interesting facts hidden in statistical data. Male tennis players who played in at least one Association of Tennis Professionals match between 1968 and 2010 were evaluated through network analysis. The researchers ran an algorithm, similar to the one used by Google to rank Web pages, on digital data from hundreds of thousands of matches. The data was pulled from the Association of Tennis Professionals website. They quantified the importance of players and ranked them by a "tennis prestige" score. This score is determined by a player's competitiveness, the quality of his performance and number of victories. Fans may think of Jimmy Connors as an "old school" tennis player, but according to a new ranking system, one of the reasons Jimmy Connors ranks on top is because he played for more than 20 years and had the opportunity to win a lot of matches against other very good players. The rankings are a snapshot of who is at the top at this time," Radicchi said. "Players who have yet to retire are penalized with respect to those who have ended their careers. Prestige scores strongly correlate with the number of victories, and active players haven't played all the matches of their careers yet. Researching and ranking sports stars gives a glimpse at the power of complex network analysis. <br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.<br />
<br />
===Week 7===<br />
*</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-24T21:47:00Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: From the example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* In multiple regression model, it is a challenge to detemine the type of the canonical outlier, and how to deal with these outliers, either to remove them or not. It will give us huge change of our reslut.<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-24T21:33:20Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: Fromthe example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
*Q: In the hierarchical data-- High school example, What is the difference between the Robinson's Paradox and Simpson's Paradox?<br />
*A: Robinson's Paradox refers to the fact that Beta-w and Beta-B can have different signs. Simpson's Paradox refers to the fact that Beta-w and Beta-p can have different signs.<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-24T20:43:48Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: Fromthe example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
===Week 6===<br />
<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
===Week 6===<br />
* Video Games Are Good for Girls<br />
[http://www.sciencedaily.com/releases/2011/02/110201083341.htm|Video Games Are Good for Girls]<br />
<br />
Researchers from Brigham Young University's School of Family Life conducted a study on video games and children between 11 and 16 years old. They found that girls who played video games with a parent enjoyed a number of advantages.A father who still hasn't given up video games now he has some justification to keep on playing, if they have a daughter. The study involved 287 families with an adolescent child. The researchers measure the outcome of several factors, such as: positive behavior, aggression, family connection, mental health,etc. The result is so interesting, for boys, playing with a parent was not a statistically significant factor for any of the outcomes the researchers measured. Yet for girls, playing with a parent accounted for as much as 20 percent of the variation on those measured outcomes.<br />
<br />
<br />
<br />
<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
<br />
===Week 6===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-16T02:14:55Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: Fromthe example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
* From the slides of Hierarchical Models,it is good to know the idea of the two-stage approach(derived variables approach): estimate slope and intercept, then use them as a multivariate sample and do a MANOVA test.</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-16T02:03:46Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: Fromthe example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]<br />
Thank you Constance!<br />
<br />
===Week 5===<br />
*</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-16T01:37:34Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: Fromthe example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
*A:First one is typical values for predictors, Y typical,it has little impact on beta-hat,it increases size of confidence intervals and decreases power. Second is typical values for predictors but Y consistent with other data, has little impact on beta-hat,it shrinks confidence intervals,it creates false sense of power if point not valid.Third is typical values for predictors and Y not consistent with other data, has large impact on beta-hat,and could shrink or expand CIs,makes a mess of everything.<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-13T20:43:11Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
===Week 5===<br />
*Q: Fromthe example of data set: Health as predicted by Weight and Height, we have three canonical outliers, What are typies of this outliers, and have to distinguish them?<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-13T20:38:34Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-02-10T04:36:17Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid<br />
bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
===Week 5===<br />
* More private liquor stores, more alcohol deaths?<br />
[http://www.reuters.com/article/2011/01/31/us-liquor-stores-idUSTRE70U7HY20110131?pageNumber=1|More private liquor stores, more alcohol deaths?]<br />
<br />
The research based on the study of 89 local areas of British Columbia, the researchers found that the number of private alcohol retailers rose by 40 percent between 2003 and 2008, while the liters of alcoholic beverages sold at those stores each year shot up 84 percent.The researchers say the findings raise concerns about the public health impact of privatizing alcohol sales. It is a move being considered by a number of Canadian and U.S. jurisdictions that currently restrict liquor sales to government-run stores.By the same time, the number of government-run stores in the province declined slightly. Moreover, the findings are based government data for 89 distinct local "health areas" in British Columbia. Across those communities, the study found, an average of eight out of every 10,000 people died of an alcohol-related cause each year between 2003 and 2008. Alcohol-related deaths included any death with "alcohol" listed on the death certificate -- including causes like alcohol poisoning and drunk driving as well as chronic conditions related to drinking, like liver cirrhosis. Province-wide, the annual alcohol-related death rate actually dipped somewhat in the latter years of the study period, but within local health areas, the researchers found a correlation between the concentration of private liquor stores and alcohol-related deaths.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".<br />
<br />
Comment: A propensity score is the probability of a particular person being allocated to a specific group in a study given certain covariates. In a truly randomized study, we are usually safe to assume that participants are randomly assigned to groups, so that any particular covariate does '''not''' make a certain person more likely to be assigned to a certain group. In observational or "quasi-experimental" designs, we don't have random assignment, so we measure how much selection bias is present by regressing the odds of being in a particular group on the covariates measured in the study. We want group allocation to be "unconfounded" with any covariates (such as individual differences). - [[../Constance Mara|Constance]]</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-01-28T05:03:09Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid<br />
bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===<br />
* For the control for all the Zs, it mentions that we can using one of those three techniques. The first one is Statistical control. I am not sure about the meaning of propensity score, although it says "prediction of X from Zs".</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-01-28T04:58:18Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
===Week 4===<br />
*Q: Think about the strategy with obervational data, what tools can we use if we want to control for all the Zs that we can?<br />
*A: We can use any one of these three below. First is statistical control which use a model that includes Zs, include them in a statistical model and adjust statisitically. Modern thinking: we don't need all the Zs in the model to avoid<br />
bias, only the “propensity score”. This can fail with the wrong model. Second is matching, this technique only perform comparisons between observations with similar Zs. Again all that really matters to avoid bias is the propensity score if you have the Zs. Third one is structural methods which build a structural causal model.<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===</div>Jyjlihttp://scs.math.yorku.ca/index.php/MATH_6627_2010-11_Practicum_in_Statistical_Consulting/Students/Jessica_LiMATH 6627 2010-11 Practicum in Statistical Consulting/Students/Jessica Li2011-01-28T04:43:06Z<p>Jyjli: </p>
<hr />
<div>==About Me==<br />
<br />
I am a Master student of Applied Statistics. I got my Bachelor Degree in Actuarial Science, and I have some background in Statistics and R.<br />
==Sample Exam Questions==<br />
<br />
===Week 1===<br />
*Q: In the example of modelling cigarette consumption vs. life expectancy, we compare cigarettes consumption in different countries, in reality other than this factor, there could be different in all sorts of ways between these countries. we know that the ideal solution introducted by Fisher is Experiment. What is Fisher's idea for experiment? <br />
*A: 1) Make sure all the countries are similar except for chance. 2) Take one country of willing subjects and randomly split it into several groups with different amounts of cigratte consumption. 3) Compare the life expectancy for these group, if the groups that have higher life expectancy due to less amount of cigratte consumption.<br />
<br />
===Week 2===<br />
*Q: From the example of the studing the inheritance of height from father to son, and the graph of the ellipse, we find that the SD line is not the regression line of son's height on father's height, the change of the son's expected height from the mean is a proportion of the change of the father's height from the mean. Why we use ellipses?<br />
*A: By using ellipses, we can easily get the formular for statistical theory and models. In addition,through ellipses the scatterplots and other graphs help us for the data analysis, such as to study the correlation betwwen the variables, how significant is the variable, and calculation of the confidence interval. <br />
<br />
===Week 3===<br />
*Q: What is the beta space, and why we want to look at our data from the beta space?<br />
*A: In the data space, the axes are variables and the points are observed. For better understanding with hierarchical data, we want to see more natural display for our models in a such space called beta space. In beta space, the axes are coefficients and the points are models represented by their coefficients. <br />
<br />
<br />
<br />
==Statistics in the Media==<br />
<br />
===Week 1===<br />
* Computer Model for Projecting Severity of Flu Season: Researchers have developed a statistical model for projecting how many people will get sick from seasonal influenza based on analyses of flu viruses circulating that season. <br />
<br />
[http://www.sciencedaily.com/releases/2010/12/101208142253.htm|Computer Model for Projecting Severity of Flu Season]<br />
<br />
The research, conducted by scientists at the National Institutes of Health, appears December 8 in the open-access publication PLoS Currents: Influenza.<br />
<br />
From the study through the 1993/1994 and 2008/2009 season, the research has shown that severity of infections with the Influenza A virus is related to its novelty. In addition, 90% of the variation in influenza severity over the periods studied could be explained by the novelty of the virus' hemagglutinin protein. People think that this result can help to improve the ability to accurately predict influenza severity. Therefore,scientists can use appropriate surveillance methods to make more informed decisions in planning for influenza, including the selection of vaccines. However, in reality there are many variations. For example, people would be among different age ranges, living at different areas and under different health conditions.<br />
<br />
===Week 2===<br />
* High Tech Crime Fighting Tool: Computer Science Analyszes And Predicts Crime<br />
[http://www.sciencedaily.com/videos/2007/0807-high_tech_crime_fighting_tool.htm|High Tech Crime Fighting Tool]<br />
<br />
Engineers at the University of Virginia have developed a new program, called web-cat, that allows police to easily access crime data online -- and spot trends that show what types of crimes happen most often, and where. Users can look for crime action by typing in specific dates, choosing types of crimes, choose locations, or find out what weapons are used most. The system then produces graphs, reports, and maps of high crime areas. The crime-fighting tool is also being upgraded to predict locations of future crimes.<br />
In the News, people indicated that "We found that a lot of our residential break-ins were occurring Mondays and Wednesdays -- then we were able to pass that on to the patrol officers, this has been shown to help police officers because if they can get an idea of what's going on, they can make better predictions as to where crimes are likely to happen in the future." Obviously, it is more likely to have crimes during night time and at the entertainment places. However, the criminals rarely choose the same place or the same person as a target twice. There are crimes everywhere among the city. It is also possible that the criminals are coming from other cities or countries. The situations are very complicate, and it is very hard to make prediction for the future, such as time, location and type.<br />
<br />
===Week 3===<br />
* Drinkers Down Under switching from beer to wine<br />
[http://www.reuters.com/article/idUSLNE70J02920110120|Weight Cited in Women's Heartburn]<br />
<br />
In a report titled "No Longer a Nation of Beer Drinkers," the Australian Bureau of Statistics said that beer consumption has fallen gradually but consistently since the 1960s, while consumption of wines and spirits has increased. From the statistics data, the researches find that at the start of the 1960s, beer made up 76 percent of all pure alcohol consumed in Australia, but in recent years, this has fallen to 44 percent. Over the same time period, the wine consumption has increased threefold, it is increased to 36 percent, while the intake of spirits has nearly doubled to 20 percent. Why the wine and spirits have huge increase in the past forty years? The researchers noted that increased consumption was likely to have been affected by numerous factors such as different age patterns in the population, increasing affluence and the growth of the Australian wine industry. Moreover, changing taxes and the introduction of random breath testing are a few of the factors that could have cut consumption. From time to time people also realize that wine is better for health than the beer does. On the other hand, the rose of the alcohol consumption comes with a cost such that alcohol abuse is costing Australians $36 billion a year.<br />
<br />
===Week 4===<br />
* Canadians spend most of waking life sedentary<br />
[http://calgary.ctv.ca/servlet/an/local/CTVNews/20110119/statscan-physical-activity-accelerometer-survey-110119/20110119/?hub=CalgaryHome|Canadians spend most of waking life sedentary]<br />
<br />
StatsCan collected the data set in a survey of the physical activity patterns of Canadian adults and kids, and divided its findings into two reports: One addressing physical activity in Canadian adults between the ages of 20 and 79, and the other examining young people between 6 and 19-years-old. The results of this study from the Canadian Health Measures Survey (CHMS) released by Statistics Canada, only 15 per cent of adults achieve the minimum amount of daily recommended exercise. In addition, the survey mentions that young people fare even worse, with just 7 per cent of those aged 5 to 17 attaining the minimum level of physical activity each day. The data reveals that adults are sedentary for an average of 9.5 hours each day while children and youth spend 8.6 hours engaged in sedentary activities such as watching television. In addition, the survey found that men have more physical activity than women on average. This situation is also appears among the teenage group. More than 8o per cent of boys and 70 per cent of girls manage to squeeze in 30 minutes of activity three days a week. From my point of view, they should consider that people whose jobs require them to perform heavy lifting and being out and about for most of their work day. This would be more than enough exercise for most people and would improve these numbers dramatically. As same as teenages, the survey should also study if they are working part-time when they study as full time students.<br />
<br />
<br />
==Questions and Comments==<br />
<br />
===Week 1===<br />
* The Question posted on Bin's blog.<br />
<br />
I guess that one could select the sample from different cities in one country instead of select different countries, in order to eliminate the condition differences.<br />
<br />
===Week 2===<br />
* Before I take this course, I have no idea about what does the consulting do and how to consult problems step by step to help others to make a right decision. From this weeks lecture, I realize that it is not easy to be a good consultant. Usually, the situation is very complicate and the manipulation are eliminated by many restrictions at the same time. During the consulting process, we need to indicate the variables from the problem, then pick the relationship of these variables that we are interested in. Moreover, we have to decide either use the obeserved data or use the experimental data to analyze the problem. There are many things to be considered, such as: how to select the sample to make analysis; if the variables y is caused by x by chance, ect. I think it is a big chanellenge for me and I find it is very interesting as while.<br />
<br />
===Week 3===<br />
* In class we saw the multiple regression model represented by a plane in data space, which is represented by a point showing the fitted slope with respect to Coffee and the fitted slope with respect to stress in beta space. since we only need 2 dimensions, for some complex models, we may need 3 or more dimensions for the beta space.I just wonder that under these situations, how can we draw ellipses to illustrate the interactions between the variables?<br />
* Comment: If we have 3 or more parameters, then we need 3D or higher dimensional ellipse. This ellipse in higher dimension would be larger because it is trying to be correct for 3 or more parameters at the same time.-''Gurpreet''<br />
* Thank you very much!<br />
<br />
===Week 4===</div>Jyjli