# Simple Data Entry in R

### From Wiki1

"

## Contents |

## Reading data

- data can come from many places
- objects created in R
- data in packages, e.g. Smoking in p3d.beta, Prestige in car
- data created in other programs/software:
- Excel
- SPSS

### Excel

- warning: pre 2007 versions of Excel had a max of 256 colums so data sets with more that 256 variables get truncated
- this was the main barrier to the use of Excel.
- with new versions it isn't a problem
- Enter data in Excel so top row contains variables names and rectangular data below
- save as .csv file, for example 'filename.csv', then read into R with

dd <- read.csv('filename.csv')

## Review of linear model interface

Note:

- different packages for fitting models use different interfaces
- One of the best is the formula interface for lm
- Also adopted in many other 'mature' packages and programs:
- glm, lmer, lme, etc.

- New packages often have crude interfaces because fancy interface is more difficult to develop especially for programmers with less knowledge of R
- We review the interface:

- Basic:

fit <- lm( formula, data) fit <- lm( formula, data subset, na.action ....)

## Indexing and subsetting in R

- selecting rows or columns from a matrix or data frame

# index (row or column numbers) # logical vector (select if TRUE) # names: use column or row names # matrix of indices # subset function # generating indices:

1:23 seq( 2,9, by = 2) rep( 1:3, c(4,2,6)) rep( 1:3, each = 4) rep( 1:3, 4)

mat <- matrix(LETTERS[1:12], ncol = 3) mat mat[ 1:2, c(1,3)] mat[ 1:2, 3] # drops dimensions mat[ 1:2, 3, drop = FALSE] # prevent dimension drop mat[ -1, ] mat[ rep(1:3,4),] mat[ c(1,2)] # insufficient indices with matrix? # matrix of indices:

inds <- rbind( c(1,2),c(2,3),c(2,1)) inds mat mat[inds] rownames(mat) <- c('Peter','Paul','Mary', 'Jane') colnames(mat) <- letters[letters[1:3]] mat[c('Peter','Paul'), c('a','c')] # combining: rbind cbind # repeat: indexing mat df <- as.data.frame(mat) df class(df) unclass(df) df[,2] df$b df[2] df2 df$x <- 1:nrow(x) subset(df, x > 2) # note that variables evaluated in df