Simple Data Entry in R

From Wiki1

Jump to: navigation, search

"

Contents

Reading data

  • data can come from many places
    • objects created in R
    • data in packages, e.g. Smoking in p3d.beta, Prestige in car
    • data created in other programs/software:
      • Excel
      • SPSS

Excel

  • warning: pre 2007 versions of Excel had a max of 256 colums so data sets with more that 256 variables get truncated
  • this was the main barrier to the use of Excel.
  • with new versions it isn't a problem
  • Enter data in Excel so top row contains variables names and rectangular data below
  • save as .csv file, for example 'filename.csv', then read into R with
 dd <- read.csv('filename.csv')

Review of linear model interface

Note:

  • different packages for fitting models use different interfaces
  • One of the best is the formula interface for lm
  • Also adopted in many other 'mature' packages and programs:
    • glm, lmer, lme, etc.
  • New packages often have crude interfaces because fancy interface is more difficult to develop especially for programmers with less knowledge of R
  • We review the interface:
  • Basic:
 fit <- lm( formula, data)
 fit <- lm( formula, data
            subset,
            na.action
            ....)

Indexing and subsetting in R

  • selecting rows or columns from a matrix or data frame
 # index (row or column numbers)
 # logical vector (select if TRUE)
 # names: use column or row names
 # matrix of indices
 # subset function
 
 
 # generating indices:
 1:23
 seq( 2,9, by = 2)
 rep( 1:3, c(4,2,6))
 rep( 1:3, each = 4)
 rep( 1:3, 4)


 mat <- matrix(LETTERS[1:12], ncol = 3)
 mat
 mat[ 1:2, c(1,3)]
 mat[ 1:2, 3]      # drops dimensions
 mat[ 1:2, 3, drop = FALSE]    # prevent dimension drop
 mat[ -1, ]
 mat[ rep(1:3,4),]
 mat[ c(1,2)]     # insufficient indices with matrix?
 
 
 # matrix of indices:
 inds <- rbind( c(1,2),c(2,3),c(2,1))
 inds
 mat
 mat[inds]
 
 rownames(mat) <- c('Peter','Paul','Mary', 'Jane')
 colnames(mat) <- letters[letters[1:3]]
 mat[c('Peter','Paul'), c('a','c')]
 
 # combining: rbind cbind
 # repeat: indexing
 mat
 
 df <- as.data.frame(mat)
 df
 class(df)
 unclass(df)
 df[,2]
 df$b
 df[2]
 df2
 
 df$x <- 1:nrow(x)
 
 subset(df, x > 2)    # note that variables evaluated in df
Personal tools