R: Local tutorial
Contributions to this tutorial are more than welcome. Quick links
- Data manipulation and data mining: http://www.togaware.com/datamining/
Basic data types
mode(2) # numeric mode(pi) mode(2+3i) # complex mode("the lazy dog") # character mode(FALSE) # logical
Basic data structures
c(1,5,12) c('a','large','boat') c(FALSE,TRUE) 1:12 # sequence seq( from = 1, to = 12, by = 1.5)
mat <- matrix( 1:12, ncol = 3) # example of assignment, Note: does not print result mat # prints result ( mat2 <- matrix( 1:12, ncol = 3, byrow = T) ) # assignment and printing rownames(mat) <- c('John','Qing','Tao','Ye') mat colnames(mat) <- LETTERS[1:3] t(mat)
arr <- array( 1:24, dim = c(2,3,4)) arr dimnames(arr)3 <- letters[1:4] arr dimnames(arr)[c(1,2)] <- list( Row = c('row 1','row 2'), Column = paste( "Col", 1:3)) arr names( dimnames( arr ) ) <- "Panel" dimnames(arr) arr aperm(arr, c(2,1,3))
- Elements can have different modes and different structures
list( 1, 'a', FALSE)
more advanced data types
- Objects can have 'post-it' notes attached to them. These are attributes. The most common attribute names the elements of an object.
x <- c( a = 11, b = 12, c = 13) x x <- 11:13 names(x) <- c('a','b','c') x mat <-
- How many occupation have education > 12
- Let z <- list(a=1:4,b=c(10,13,34), c=4, d=c(NA,2))
- Find the maximum value in each element of the list
- Find the mean income for each type of occupation
- Write a program 'prime(n)' that will find all primes from 1 to n
- Plot means, standard deviations, and standard errors by group
Even more advanced
In practice, most data generated by graduate students in Psychology are entered with Excel or with SPSS. This section describes how to convert Excel or SPSS files to R.
- Format the file so the top row contains variables names. It's a good idea to use "NA" (without quotes) as a missing value code. Obey variable name rules for R: no spaces (use '.' instead of a space), only letters, numbers (not in the first place) and periods. Avoid the octothorp (#) because it turns all that follows on the line into a comment.
- Save the file as a '.csv' (comma-separated values) file. Note that only the active worksheet will be saved. Be aware that a .csv file can have no more than 256 columns. This is a significant limitation for some projects in which data is recorded for many subscales of psychological tests. Let's suppose that you save the file as 'c:\newdata.csv' in Windows.
- In R use the command
> newdata <- read.csv("c:/newdata.csv")
> newdata <- read.csv(file.choose())
(Note the use of a forward slash where the usual Windows syntax would use a backward slash.)
More up-to-date information is available at R:_Data_conversion_from_SPSS
SPSS files can be read directly with the 'read.spss' function in the 'foreign' package:
> library( foreign ) > newdata <- read.spss("c:/newdata.sav")
Files created with very recent versions of SPSS will produce a warning message but the problem seems innocuous. Missing data codes need to be processed further in the R file.
The plain use of the 'read.spss' command, above, produces a 'list' instead of a 'data.frame'. Also, value labels have extra spaces to stretch them to 256 characters. Generally is is better to use:
> library( foreign ) > newdata <- read.spss("c:/newdata.sav", trim.factor.names = T, to.data.frame = T)
The 'read.spss' function was written for older versions of SPSS and works best if variable names in the SPSS file have at most 8 characters. If your variable names are longer they will be turned into shorter but unique names. You can change the names in the R data.frame back to the original names if you wish.
For more information, see R:_Data_conversion_from_SPSS