# MATH 6643 Summer 2012 Applications of Mixed Models/Students/smithce/A2

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

### Assignment 2

Due June 7 before class

You must write up your own work based on your own understanding but you can do anything you want to develop your understanding.

1. [10] A basis for $\mathbb{R}^p$ that is a conjugate basis with respect to a positive definite matrix M is a sequence of vectors x1,x2,...,xp in $\mathbb{R}^p$ such that x'iMxi = 1 and x'iMxj = 0 if $i \neq j$. Show that the columns of a non-singular matrix A form a conjugate basis with respect to Σ − 1 if Σ = AA'. Note that a conjugate basis is merely an orthogonal basis with respect to the metric defined by | | x | | 2 = x − 1x.
2. [10] We will call a "square root" of a square matrix M any square matrix A such that M = AA'. Show that a square matrix has a square root if and only if it is a variance matrix.
3. [10] Write a function in R that computes a square root of a variance matrix M. Use the 'eigen' function. [Bonus: 2] Get your function to give an informative error message if M does not have a square root for some reason.
4. [10] Using the function in 3, write a multivariate normal random number generator. Write it to parallel the univariate 'rnorm'. The univariate 'rnorm' takes three arguments: n, mean and sd. Consider writing your 'rmvnorm' so the third argument, if given, must be named either 'var' or 'sd' (depending on whether the user is giving a variance or the square root of a variance as input) to avoid confusion with the univariate generator. The default could be the identity -- which doesn't need to be distinguished as 'var' or as 'sd'.
5. [10] Write a simple 'lmfit' function that calculates least squares regression coefficients using an algorithm based on the svd. Ideally, design the function so it takes a formula and a data frame as arguments, e.g. lmfit( y ~ x1 + x2, dd). You can generate the model matrix using the 'model.matrix' function and extract the response using the first column of the model.frame command.
6. [10] Consider a $2 \times 2$ variance matrix $\Sigma = \begin{bmatrix} \sigma_{11} & \sigma_{12} \\ \sigma_{21} & \sigma_{22}\end{bmatrix}$ for a random vector $\begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix}$. Verify that the Cholesky matrix $C = \begin{bmatrix} \sigma_{11}^{1/2} & 0 \\ \sigma_{21}/ \sigma_{11}^{1/2}& \sqrt{\sigma_{22} - \sigma_{12}^2 / \sigma_{11}}\end{bmatrix}$ is a square root of Σ.
Show that the Cholesky matrix can be written as $\begin{bmatrix} \sigma_1 & 0 \\ \beta_{21} \sigma_1 & \sigma_{2 \cdot 1}\end{bmatrix}$ where β21 is the regression coefficient of Y2 on Y1.
Draw a concentration (or data) ellipse and indicate the interpretation of the vectors defined by the columns of C relative to the ellipse.
7. [10] Show that a non-singular $2 \times 2$ variance matrix, Σ can be factored so that Σ = AA' with A an upper triangular matrix [in contrast with problem 6 where the matrix is lower triangular]. Explain the interpretation of the elements of this matrix as in question 6.
8. [20] Generate 100 observations for three variables Y, X and Z so that in the regression of Y on both X and Z neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 is rejected at the 1% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for X and Z and appropriate confidence ellipses for their two regression coefficients. What does this example illustrate about the appropriatenes of scanning regression output for significant p-values and concluding that nothing is happening if none of the p-value achieve significance?
9. [20] Generate 100 observations for three variables Y, X and Z so that in the separate simple regressions of Y on each of X and Z neither regression coefficient is significant (at the 5% level) but a test of the hypothesis that both coefficients are 0 in a multiple regression of Y on both X and Z is rejected at the 5% level. Explain your strategy in generating the data. How should the data be generated to produce the required result? Show a data ellipse for X and Z and appropriate confidence ellipses for their two regression coefficients. Explain the relationship between the ellipses and the phenomenon exhibited in this problem. What does this example illustrate about the appropriatenes of forward stepwise regression to identify a suitable model to predict Y using both X and Z?