# Simple Data Entry in R

"

## Contents

• data can come from many places
• objects created in R
• data in packages, e.g. Smoking in p3d.beta, Prestige in car
• data created in other programs/software:
• Excel
• SPSS

### Excel

• warning: pre 2007 versions of Excel had a max of 256 colums so data sets with more that 256 variables get truncated
• this was the main barrier to the use of Excel.
• with new versions it isn't a problem
• Enter data in Excel so top row contains variables names and rectangular data below
• save as .csv file, for example 'filename.csv', then read into R with
``` dd <- read.csv('filename.csv')
```

## Review of linear model interface

Note:

• different packages for fitting models use different interfaces
• One of the best is the formula interface for lm
• Also adopted in many other 'mature' packages and programs:
• glm, lmer, lme, etc.
• New packages often have crude interfaces because fancy interface is more difficult to develop especially for programmers with less knowledge of R
• We review the interface:
• Basic:
``` fit <- lm( formula, data)
fit <- lm( formula, data
subset,
na.action
....)
```

## Indexing and subsetting in R

• selecting rows or columns from a matrix or data frame
``` # index (row or column numbers)
# logical vector (select if TRUE)
# names: use column or row names
# matrix of indices
# subset function

# generating indices:
```
``` 1:23
seq( 2,9, by = 2)
rep( 1:3, c(4,2,6))
rep( 1:3, each = 4)
rep( 1:3, 4)
```

``` mat <- matrix(LETTERS[1:12], ncol = 3)
mat
mat[ 1:2, c(1,3)]
mat[ 1:2, 3]      # drops dimensions
mat[ 1:2, 3, drop = FALSE]    # prevent dimension drop
mat[ -1, ]
mat[ rep(1:3,4),]
mat[ c(1,2)]     # insufficient indices with matrix?

# matrix of indices:
```
``` inds <- rbind( c(1,2),c(2,3),c(2,1))
inds
mat
mat[inds]

rownames(mat) <- c('Peter','Paul','Mary', 'Jane')
colnames(mat) <- letters[letters[1:3]]
mat[c('Peter','Paul'), c('a','c')]

# combining: rbind cbind
# repeat: indexing
mat

df <- as.data.frame(mat)
df
class(df)
unclass(df)
df[,2]
df\$b
df[2]
df2

df\$x <- 1:nrow(x)

subset(df, x > 2)    # note that variables evaluated in df
```