SCS 2014: Visualizing Regression
At their best, graphics are instruments for reasoning. – Edward Tufte
This is the home page for the SCS short course on Visualizing Regression offered Tuesday evenings from 6pm to 9pm from November 4 to November 25, 2014. The longer title is: The Concepts Behind Regression: A Visual Approach to Learning Almost All About Regression.
The workshop consists of a series of lectures/graphical demonstrations that illustrate a variety of paradoxes, surprises and incongruities related to applying regression to research problems. All we need is regression on two predictors to encounter traps that even the 'experts' frequently fall into. A recent paper that discusses the geometry behind many of the visualizations in this workshop is
- Friendly, Monette and Fox (2013) "Elliptical Insights: Understanding Statistical Methods through Elliptical Geometry".
Don't be daunted by the fact that the paper goes into the math behind the pictures. Our workshop will focus on using the pictures to visualize statistical concepts. We don't need the math unless you're interested in it for its own sake.
I will post pdf files and R scripts as well as screen-capture videos on this page as the workshop progresses.
If anyone wishes to review the material covered in an introductory regression course consider reading:
- John Fox (2008) "Applied Regression Analysis and Generalized Linear Models, Second Edition, and
- John Fox and Sanford Weisberg (2011) "An R Companion to Applied Regression, Second Edition"
Some links to other books or materials discussed during the course:
- Richard J. Murnane and John B. Willett (2010) Methods Matter: Improving Causal Inference in Educational and Social Science Research, Oxford University Press, NY
- An excellent introduction to causal inference in the familiar context of educational research but applicable to all areas.
- Harrell, Frank E. Harrell (2002) Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis, Springer Series in Statistics
- Some online tutorials for R:
Approaches to Regression
- Using mathematical formulas
- Using Matrices and Linear Algebra
- where Y is an vector ...
- Data: scatterplots in 'data' space.
- Computing: "How to": Commands to run regressions
- Geometry: "Variable Space" in which each variable is represented as a vector n-dimensional space
- Beta Space: The space of coefficients and their estimates
- Interpretation: What does a regression and its coefficients mean in a real application
What insights can we get from visualizing regression
Here's a sample of questions that may not have very obvious answers when approaching regression from a traditional formulaic or matrix algebra approach.
- Fundamental 2 x 2 Table of Statistics: original with notes
- Smoking data in .csv format
- R script for smoking data
- Visualizing Simple Regression: original with notes
- Screen capture video
The Dylan effect (or paradox)
- Thanks to Bryn Greer-Wootten for hunting down the relevant lyrics from Bob Dylan's "Ballad of a Thin Man":
Because something is happening here
But you don’t know what it is
Do you, Mister Jones?
But another possible source for a name for the phenomenon comes from Buffalo Springfield's "Something happening here":
There's something happening here
What it is ain't exactly clear ...
We were able to rescue the screen capture videos in two parts, before and after the projectors went off: