# MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Crystal Cao

(Difference between revisions)
 Revision as of 19:42, 2 April 2011 (view source)Yrcao (Talk | contribs)← Older edit Revision as of 19:42, 2 April 2011 (view source)Yrcao (Talk | contribs) Newer edit → Line 50: Line 50: ===Week 11=== ===Week 11=== *Q:  In the multiple linear regression,  we get the formula $\hat{y}=\hat{\beta_0}+\hat{\beta_1}x_1+\hat{\beta_2}x_2+\hat{\beta_3}x_3$. Then we could use Fisher’s Z transformation to transform $\hat{\beta_1}\cdots \hat{\beta_3}$ to $B_1 \cdots B_3$. If $B_1 \approx B_2$, but the ANOVA test shows that the P-value for $B_1$ is 0.01, but the P-value for $B_2$ is 0.08. Why? *Q:  In the multiple linear regression,  we get the formula $\hat{y}=\hat{\beta_0}+\hat{\beta_1}x_1+\hat{\beta_2}x_2+\hat{\beta_3}x_3$. Then we could use Fisher’s Z transformation to transform $\hat{\beta_1}\cdots \hat{\beta_3}$ to $B_1 \cdots B_3$. If $B_1 \approx B_2$, but the ANOVA test shows that the P-value for $B_1$ is 0.01, but the P-value for $B_2$ is 0.08. Why? - *A: $B_i=\frac{\hat{\beta_i}Sx_i}{S_y}/math>, SE(\hat{\beta_i}=\frac{Se}{\sqrt{n}Sx_i\right|others}$. In ANOVA analysis, the t-score is equal to $k\times B_i \times C_i$, where $k=\frac{\sqrt{n}}{Se}$ , $C_i= Sx_i\right|others$ and the P-value is a function of t-score. Although $B_1 \approx B_2 <\math>, but [itex]C_1$ is not necessarily equal to $C_2$, and P-value for $B_1$ is not necessarily equal to $B_2$. Only in case of there are two predict variables,  $B_1 \approx B_2 <\math>, means [itex] P_1 \approx P_2 <\math>. + *A: [itex]B_i=\frac{\hat{\beta_i}Sx_i}{S_y}, SE(\hat{\beta_i}=\frac{Se}{\sqrt{n}Sx_i\right|others}$. In ANOVA analysis, the t-score is equal to $k\times B_i \times C_i$, where $k=\frac{\sqrt{n}}{Se}$ , $C_i= Sx_i\right|others$ and the P-value is a function of t-score. Although $B_1 \approx B_2 <\math>, but [itex]C_1$ is not necessarily equal to $C_2$, and P-value for $B_1$ is not necessarily equal to $B_2$. Only in case of there are two predict variables,  $B_1 \approx B_2 <\math>, means [itex] P_1 \approx P_2 <\math>. ## Revision as of 19:42, 2 April 2011 ## Contents ## About Me My name is Yurong, and I am also known as Crystal in our department. As a PhD student in Applied Math, my research involved a lot statistical analysis. The project I am currently involved in is: The Climate and Environmental Impact on the Distribution Properties of Mosquito Abundance in Peel Region. The dynamic models are very popular on studying the transmission of vector-borne diseases. In those dynamical models, the parameters were determined by simulations or estimations. The models were successful in predicting the dynamic of vector populations under different circumstance, but could not reflect the dynamics of the vector population with the change of the climate and environmental factors. Now I am using statistic analysis to build up the association between vector abundance and the climate and environmental factors. By learning more statistic knowledge, I hope I could have deeper understanding on statistic methods and use the combination of mathematical modeling and statistical analysis in my research. ## Sample Exam Questions ### Week 1 • Q: List the possible methods of controlling for the effects of confounding factors while using observational data. • A: 1) Randomization; 2) Matching;3) Stratification: analyzing each stratum with similar values for the confounding factor(s); 4) Building a statistical model in that includes the confounding factor(s) and using multiple regression. Since the confounding factor may be known but may be measured with error so that it is not fully controlled and some important confounding factors might not be known, there is no perfect solutions and judgment must be used in applying it and in assessing studies based on these methods. ### Week 2 • Q: Considering the coffee consumption problem, how should we choose the explanatory variables in the model? • A: How to choose the variables not only depends on the significance of the variable. The cost to get the data and the quality of data, etc also need to be considered. How to build the model depends on the question answered. While we need to answer how the variable affects the heart damage, we need to add the variable in the model even it is not significant. While answering whether the variable affects the heart damage, we may drop it if it is not significant. ### Week 3 • Q: Look at the example of the coffee consumption and heart damage in the lecture note, we found the marginal confidence interval is smaller than the conditional one for the predicted coffecient of coffee. How did it happen? • A: Since the two predictors coffee consumption and stress are related to each other, the marginal confidence interval is not equal to the conditional one. If the two predictors are not related, the two intervals will be equal. ### Week 4 • Q: What is the difference between the interaction between variables and the correction between variables? • A: In statistical regression analyses, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive. Correlation measures the strength of the linear relationship between quantitative variables, relationship may not be linear. In regression analyses, the interaction and correlation between variables are two different concepts, and they have no relationship with each other. The two variables may be interactive but not correlated, interactive and correlated, not interactive and not correlated, not interactive but correlated. ### Week 5 • Q: What is the relationship between interaction and collinearity? • A: Interaction is often confused with collinearity. Collinearity refers to associations among predictors: i.e. the extent to which the predictor data ellipse is tilted and eccentric. It is not related to Y Interaction refers to a situation in which the relationship between Y and the predictor variables is not 'additive', i.e. the effect of some variable depends on the levels of the other variable. It has nothing to do with the relationships among the Xs – only in the way they affect Y Interaction or collinearity can exist with or without the other. The presence of either does not even suggest the likely presence of the other. ### Week 6 • Q: What is the difference between a characteristic of the school and a 'derived' variable? • A: a derived variable could have a different value with a different sample of students. A characteristic of the school would not. ### Week 7 • Q: Which confidence interval is better, Scheffe or Bonferroni when goes to infinite? • A: for models with one or two degrees of freedom for error, Scheffe is always superior to Bonferroni. For three or more degrees of freedom for error, Bonferroni is superior to Scheffe for standard confidence levels, but the reverse is true for nonstandard levels. ### Week 8 • Q: How Lack of fit contributes to autocorrelation? • A: Lack of fit will generally contribute positively to autocorrelation. For example, if trajectories are quadratic but you are fitting a linear trajectory, the residuals will be positively autocorrelated. Strong positive autocorrelation can be a symptom of lack of fit. This is an example of poor identification between the FE model and the R model, that is, between the deterministic and the stochastic aspects of the model. ### Week 9 • Q: According to the Principle of marginality, we don not necessary drop items because they are not significant. Then it arise the question that how should decide to drop an item while it is not significant? • A: It depends on the questions we should answer. In the case we are dealing with observational data and we try to predict the relation between the dependent variable and the independent variables, we may drop an item if the p-value is small. In the case we try to find the causing facts between Y and Xs, we could not drop an item only according to the p-value since it may be a compounding factor or an intermediate factor. ### Week 10 • Q: Why does the Ordinary least-squares analysis fail to explain on pooled Orthodont data? • A: Because the residuals within clusters are not independent; they tend to be highly correlated with each other. Because the residuals within clusters are not independent; they tend to be highly correlated with each other. We could use repeated measures (univariate and multivariate) or two-stage approach. ### Week 11 • Q: In the multiple linear regression, we get the formula $\hat{y}=\hat{\beta_0}+\hat{\beta_1}x_1+\hat{\beta_2}x_2+\hat{\beta_3}x_3$. Then we could use Fisher’s Z transformation to transform $\hat{\beta_1}\cdots \hat{\beta_3}$ to $B_1 \cdots B_3$. If $B_1 \approx B_2$, but the ANOVA test shows that the P-value for B1 is 0.01, but the P-value for B2 is 0.08. Why? • A: $B_i=\frac{\hat{\beta_i}Sx_i}{S_y}$, SE(\hat{\beta_i}=\frac{Se}{\sqrt{n}Sx_i\right|others}$. In ANOVA analysis, the t-score is equal to $k\times B_i \times C_i$, where $k=\frac{\sqrt{n}}{Se}$ , Failed to parse (syntax error): C_i= Sx_i\right|others
and the P-value is a function of t-score. Although Failed to parse (unknown function\math):  B_1 \approx B_2 <\math>, but [itex]C_1