Math achievement and ses in a sample of 160 U.S. schools from the 1982 study “High School and Beyond”.
This is a classical data set from the field of education used to illustrate multilevel data and models. It is used in the first edition of Bryk and Raudenbush. hsfull is the complete data set with 160 high schools, hs is a random subset of 40 high schools, hs1 is a random subset of 80 schools and h2 contains the complement of hs1. These two subsets can be used to illustrate split sample validation: develop a model on one half of the data and assess its performance on the other. complementary
hsfull is a data frame with 7185 observations on 160 schools on the following 9 variables. hs, hs1 and hs2 consist of subsets of 40 schools, 80 schools selected randomly and the remaining 80 schools respectively.
- school id
- measure of math achievment
- socio-economic status of family
- a factor with levels Female Male
- a factor with levels No Yes
- the size of the school
- a factor with levels Catholic Public
- a measure of the priority given by the school to academic subjects
- a measure of the disciplinary climate in the school
Each row consists of the data for one student. hsfull is the complete data set. hs1 and hs2 are complementary split halves of the schools in the data. hs is a selection of 40 schools which seems to be a good number of clusters for presentations in class.
Source and Reference
Raudenbush, Stephen and Bryk, Anthony (2002), Hierarchical Linear Models: Applications and Data Analysis Methods, Sage (chapter 4).