MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Annie Wang
My name is Hangjing Wang, you can call me Annie instead of Hangjing. I am a master student of applied statistic, and I am glad to join this course to know more acknowledge about statistic in practice.
Sample Exam Questions
Q:Is smoking responsible for higher life expectancies?
A:Smoking is a corelation factor, maybe something else- a confounding variable cause both higher life expectancies and higher rate of smoking.
Q:Why did Galton and Pearson study the inheritance of height from father to son? What if they studied other inheritances, the conclusion will be the same?
A:I think maybe they studied the inheritance of height from father to son just because it is easy to sample. If they studied other inheritances, the conclusion will be the same.
Q:What is the beta space? Why we use it?
A:In the 'data space', the axes are variables and the points are observations. We want to see models in beta space that is more natural for models.
Q:What is the difference between confounding factor and mediating factor?
A:A confounding factor will change other predictor and the reponse variable and it won't be changed by other predictor. A mediating factor will be changed by other predictor and it won't change other predictor.
Q:How many types of outliers are there in multiple regression? And what are their charactors?
A:There are three types of outliers. (1)Typical values for predictors, Y atypical. Little impact on regression coefficient; Increases size of confidence intervals; Decreases power(2)Atypical values for predictors but Y consistent with other data. Little impact on regression coefficient; Shrinks confidence intervals; Creates false sense of power if point not valid (3)Atypical values for predictors and Y not consistent with other data. Large impact on regression coefficient; Could shrink or expand CIs; Makes a mess of everything.
Q:What is the main character of hierarchical model?
A:Hierarchical linear modeling (HLM), also known as multi-level analysis, is a more advanced form of simple linear regression and multiple linear regression. Multilevel analysis allows variance in outcome variables to be analysed at multiple hierarchical levels, whereas in simple linear and multiple linear regression all effects are modeled to occur at a single level. Thus, HLM is appropriate for use with nested data.
Q:How to choose BLUEs and BLUPs ?
A:They are best for different things. BLUE is best for resampling from the same school over and over again. The BLUP is best on average for resampling from the population of schools and students.
Q:Mixed models for hierarchical data and for longitudinal data look almost the same, what's the difference between them?
A:In the mixed models for hierarchical data, the j means observations in jth cluster and in the mixed models for longitudinal data, the j means observations over time on jth subject.
Q:When you get a positive or a negative autocorrelation, how can you interpret them?
A:Usually most natural processes would be expected to produce positive autocorrelations, occasional large measurement errors can create the appearance of a negative autocorrelation.
But strong positive autocorrelation can be a symptom of lack of fit, and in a well fitted OLS model, the residuals are expected to be negatively correlated, more so if there are few observations per subject.
Q:When you want to create a mixed model for longitudinal data, which kinds of functions of time would come into your mind? And how to choose them?
A:There are many kinds of functions of time such as linear,quadratic, higher polynomials, splines, exponential growth and decay, exponential asymptotic growth, periodic functions and so on.
Plot the data, and according to the character of the data, draw a sketch and make an educated guess. Then test the goodness of fit of the model you choose.
Q:Within the limits imposed by sample size, what are the aims when we try to construct a model? And what is the trick to model individual trajectories in longitudinal data analysis?
A:We want to construct a model who captures the main theoretical properties of the phenomenon, and preferably has interpretable parameters.
A good strategy in longitudinal data analysis is to start by building a plausible model for individual trajectories even if there isn't enough data from any one individual to actually fit the model. If the data are unbalanced and you are willing to assume that the between-subject effect is close to the within-subject effect, then the estimation of individual trajectories 'borrows strength' from the between-subject model.
Statistics in the Media, Paradoxes and Fallacies, Consulting Reflections
US Steel sees markets improving after Q4 loss
Yahoo's Q4 earnings double as revenue falls
Yahoo Inc.'s fourth-quarter earnings more than doubled, but the Internet company's crumbling revenue showed that it's still struggling to cash in on the online advertising boom.
Second wave of housing bust hammers more cities
A second wave of falling home prices in the United States is battering some cities that had escaped the worst of the housing market bust.
Prices in Seattle, Charlotte, N.C., and Portland, Ore., have hit their lowest points since peaking in 2006 and 2007. Denver and Minneapolis are nearing new lows. High unemployment and rising foreclosures are taking a toll even on markets that never overheated during the boom years.
Canadian truck sales decline for Ford, GM, rise for Chrysler
Chrysler Canada Inc. managed to continue to entice consumers to purchase its trucks in February despite soaring fuel prices. But its gains came at the expense of its Detroit rivals, Ford Motor Company of Canada, Ltd. and General Motors of Canada Ltd., who both experience double-digit declines in the important truck segment last month.
U.S. owes China a third more than thought
The U.S. government owes nearly a third more money to China than previously thought, the Treasury Department said on Monday as it revised Beijing's December holdings of U.S. Treasury debt sharply higher to US$1.160 trillion.
The US$268.4-billion increase over figures reported on Feb. 15 was contained in a survey of foreign portfolio holdings of U.S. securities that provided fresh evidence that China has been buying Treasuries through broker-dealers in Britain.
Pressure grows for rate hike
Pressure on the Bank of Canada to raise rates is likely to build after Statistics Canada reported Monday the economy ended 2010 with a bang as it grew 3.3% annualized in the fourth quarter – a full percentage point above the central bank’s expectations.
Leading the way were exporters, which posted their best quarterly performance in eight years. The data suggested the economy roared in December as real GDP advanced 0.5% on a month-over-month basis. This, coupled with an upward revision in growth for previous months, meant Canada grew 3.1% in 2010, matching the best annual performance since 2000.
Canadian travel deficit widens in Q4
Canadian travellers were increasingly spreading their wealth abroad as 2010 came to a close, more so than foreign tourists to Canada were increasing their spending here, Statistics Canada said Friday.
Why can't we stop spending?
Imagine: A retailer takes your product off the shelf and reserves it for you until you’ve paid in full. Yes, children, it existed at one point. Today you get your goods immediately and worry about paying for it later.
If only our retirement savings could be funded like that. The notion of saving has taken a back seat to spending to the point that last quarter, consumer debt hit an all-time record in Canada, 148% of disposable income, and exceeded the U.S. rate for the first time.
Canadians' net worth hits record level
Household debt as measured against disposable income nudged lower from a record high in late 2010, data suggested Monday, a sign that Canadians are beginning to curb their borrowing behaviour.
How much do Canadians make?
One of the measures in this past week’s federal budget highlighted this fact by showing just how many seniors have very little income. The budget announced an enhancement to the Guaranteed Income Supplement such that seniors with little or no income other than Old Age Security and the GIS will receive additional annual benefits of up to $600 for single seniors and $840 for couples.
To qualify for the full supplement, single recipients would have to an annual income (excluding Old Age Security and GIS) of $2,000 or less and couples would have to earn under $4,000 annually.
Above these income thresholds, the amount of the top-up will be gradually reduced and will be completely phased out at an income level of $4,400 for singles and $7,360 for couples. The government estimates that this measure, if ultimately passed, would benefit more than 680,000 seniors across Canada.
That so many Canadians have such a low income may come as a shock to you but if you take a close look at the statistics, you may uncover some eye-opening facts that could reframe your view of how much Canadians actually earn.
Younger Canadians intent on buying homes -- next year
Canadians younger than 35 are most intent on buying a home over the next two years, according to a survey released Thursday.
However, most of those in this age group, which included people 18 to 34, indicated in the Royal Bank of Canada’s annual home-ownership survey that it would be better to wait until next year to make a purchase.
Fifty-five per cent of respondents in this age category said it makes sense to wait until next year before buying a home, compared to 45% of respondents overall who felt this way.
Questions and Comments
Stopping smoking is worse than keeping smoking? This is the first time to hear this for me. The Research data is real or just an assumption?
I think the 3D plot is amazing to show the relations of different variables. And we can understand the relations directly and intuitionally.
I find that the "spida" and "p3d" R package is contributed by George! And spida is about mixed model, p3d is about visualization of model.
What is the difference when we handle the intermediate variable and the confounding variable? Is there something we should pay attention to?
About the Simpson's paradox, when the direction and association in each partial table are different from these in marginal table, Simpson's paradox will happen.
Hierarchical model can be used with nested data. For example, in educational research, data is often considered as pupils nested within classrooms nested within schools.
I am glad to have a chance to do consulting in practice, and I think we should make a plan on how we get more useful information from the client, then we can conclude the main research questions the client want us to handle.
About our project, can we use gender, marital status, education status as the predictors? How complicated the longitudinal model would be?
Now I know the difference between as.factor and as.numeric.
In the longitudinal R script, why the random effect have the same intercept(random=~1) in the lme function?
During the last lecture, professor suggest us to spline function to construct our model, Why we should use this function? It will be suitable for what kind of data?