MATH 6627 2010-11 Practicum in Statistical Consulting/Students/Constance Mara
I am a PhD student in psychology in the Quantitative Methodology area. I received my MA from York in Quant Methods, and my BSc in psychology from Trent University. While my own research focuses on Quantitative Methods, I frequently collaborate with researchers whose interests are broadly encompassed within the areas of social and personality psychology. In addition to this experience with applied statistics within the behavioural sciences, I have taken many applied statistics courses, but I have no background in mathematics. In fact, my last math course was in high school! I think this is common of many of the statisticians produced from psychology - e.g., Jacob Cohen!
I have been attending SCS meetings and seminars since 2008, and this year was hired as one of their statistical consultants. I have been a TA for SPIDA as well as several of the short courses offered by SCS. I have some experience working with SAS, R, LISREL, SPSS, AMOS, STATISTICA and a very tiny bit of exposure to MATLAB.
Sample Exam Questions
Q: A client comes to a consulting session with a study looking at depression as an outcome. The depression measure is continuous, but the hypothesis that there was a difference between 2 groups on depression didn't pan out, because the t-test was not significant. Their supervisor has instructed them to score the depression items such that they have 3 levels - not depressed, somewhat depressed, depressed. The supervisor suggests that this method of scoring may eliminate some of the white noise in the scale. What would you say to the client?
A: There are a number of ways you could answer this question. I would personally tell the client that the "white noise" they are being told to remove is the variability of the scale. If you categorize a continuous variable, you lose a lot of information and are even less likely to find an effect.
Q: A client comes to a consulting session with data on 15 pairs. They want to know what kind of statistics would be appropriate to run on such a small sample.
A: None will have very much power, so my best advice would be to get a larger N!
Q: What is the difference between simple regression and multiple regression?
A: In simple regression, we are regressing an outcome onto a single predictor and can interpret the coefficient as the effect of the predictor on the outcome. In multiple regression, we are regressing an outcome variable on two or more predictors and the coefficients are interpreted as the effect of a particular predictor on the outcome controlling for or holding constant the effect of the other predictor(s).
Q: Using the "vcd" package in R, produce a plot with the "UCBAdmissions" dataset to demonstrate Simpson's paradox.
A: Can see example of this on Diaconis page under Assignment 1.
Q: What are the differences between Lord's Paradox, Simpson's Paradox and Suppression Effects? What are the similarities?
A: Similarities - they all include situations where a third variable somehow influences the effect of X on Y. In Simpson's paradox, this means that the direction of the relation between X and Y changes when the third variable is included. For a suppression effect, this means that the relation between X and Y is suppressed when the third variable is in the model. Lord's paradox is the same as Simpson's, but the nature of the variables involved (e.g., continuous, categorical, etc.) is different.
Q: What is the difference between hierarchical models and a hierarchical model?
A: Hierarchical models are several nested models who include the same variables, but changing the parameters (a mediation model where the direct path is included in one model, then excluded for the second model). A hierarchical model is a model that contains several levels of variables (e.g., students nested within classes).
Q: Why do we need to use multi-level models for longitudinal data? Can't we simply use the classic repeated measures procedures?
A: There are many advantages to using multi-level models over the classic repeated measures produces, however, the most simple reason is that repeated measures procedures can't handle unbalanced data. Often longitudinal data is quite messy, and multi-level models provide efficient and honest ways of dealing with unbalanced designs.
Q: You run a regression model with 2 predictors (x1 and x2) and an interaction between them. The coefficient for x1 is not significant, and the interaction is not significant. Can you drop both of these terms from the model?
A: No. You should drop the interaction and re-run the model with just x1 and x2. These coefficients could change drastically when the interaction is not included.
Q: What is the difference between the "R-side" of the model in a regular multi-level model versus a multi-level model for longitudinal data?
A: The R-side in a regular multi-level is composed of sigma-squared times the identity. For longitudinal data, the R-side contains sigma-squared times a correlation matrix that specifies how the occasions are related.
Q: What is the diffeence between a fixed effect and a random effect?
A: A random effect is an effect where only a sample of all possible levels/options from the population is used (e.g., ages 10-14). A random effect is an effect where all possible levels/options from the population have been accounted for (e.g., gender - male and female).
Statistics in the Media, Paradoxes and Fallacies, Consulting Reflections
Statistics in the News:
Recently, there has been a lot of interest in a set of experiments that supposedly lend strong evidence to the existence of ESP (extra-sensory perception). This is being called the "psi effect". The researcher, a professor emeritus from Cornell University used well-established experimental designs in memory research, but reversed the chronological order (9 experiments in total). This research is to be published in one of the top-tiered journals in psychology: Journal of Personality and Social Psychology (JPSP). I am posting links to 3 public presentations of the topic, as well as a link to the original paper.
- Pop Psychology Magazine, "Psychology Today" - includes a discussion of the effect sizes in the study
A couple things I noticed:
1) I am not sure I agree with the second article's statement "...small effect sizes are not that uncommon in psychology (and other sciences)...And as Cohen has pointed out, such small effect sizes are most likely to occur in the early stages of exploring a topic, when scientists are just starting to discover why the effect occurs and when it is most likely to occur." In the SCS meeting this week, we discussed an article released in the New Yorker, suggesting that the effect sizes in initial studies are quite large, and decrease over subsequent replications of the study. This seems to contradict the statement from Psychology Today.
2) In the actual article, the p-values are just significant (.01 - .03) and he used a one-tailed test for everything. Effect sizes are based on Cohen's d, and are all around .20 (considered a small effect).
Specifically, in experiments 3 and 4, you'll notice that there is a comparison between the forward priming (typical memory study) and retroactive priming (evidence for psi effect). The forward priming tasks obtain p-values less that .001, with effect sizes around .40 to .45. In contrast, the retroactive priming tasks have p-values around .01 to .03 and effect sizes around .20.
The "better than chance" effects are just over the 50% mark, all under 55%. I guess one has to wonder whether these are true effects, or statistical "tweaking" to achieve something publishable. I am also wondering why a journal as prestigious as JPSP would be willing to publish a study with such small effects? Is it a publicity stunt or are they trying to force researchers to start thinking "outside the box"?
Comment: Thanks for bringing this one up Constance, it is an intriguing set of studies to say the least! I think it is interesting that the author chose to include the Stouffer's z value so prominently in the abstract. Of course this makes sense, as Stouffer’s z=6.66, p=1.34×10^-11, which sounds far more impressive that the p-values for individual experiments that you mentioned above. Stouffer's z, which is a meta-analysis probability pooling technique, has been criticized as leading to vague conclusions and "that it is fatuous to claim that one is testing an average effect size in any important sense" (Darlington & Hayes, 2000). There is also the File Drawer Problem, which means that experiments that yielded non-significant findings might languish in a drawer and never get published and thus be included in the meta-analysis. Bem confesses on page 47 that they conducted three such non-significant studies that were not included in the paper (for good reason, because they were flawed of course!), but does not supply a Stouffer's z value which includes these three additional studies!
At any rate, I should think that if he is using one tailed tests he needs to provide strong evidence for the direction of the effect, which he doesn't appear to do. Yet in some experiments Bem considers scores above 50% as evidence of psi and below 50% in other experiments as evidence of psi. I think his approach would hold if the directionality is always as predicted by the regular forward version of the tests, but in skimming the article I couldn't find references to support his 'post-hoc' hypothesized direction of effect. For example, in experiment 6, for negatively valenced pairs the target pictures were preferred more frequently at 51.8%, and for erotic image pairs the target was preferred less frequently 48.2%. He never says (or cites) whether, in forward tests, whether the directionality for negative/erotic stimuli preference is in accordance with previous research. Then, of course, he takes the difference (yay, 3.76% of psi!) - Carrie
This article discusses research that shows that university-educated immigrants are paid more in the US than in Canada. They compare domestic born and educated individuals within Canada to recently-immigrated and university-educated abroad. They make this same comparison in the US. They say, for example, that Canadian-born males earn 50% more than male immigrants to Canada, whereas in the US, the pay gap between US-born males and immigrants is only 30%.
They acknowledge different rates of immigration, so I would have to assume they have controlled for this factor. I can't help thinking that there is something else that might explain the differential pay rates in Canada versus the US. There are so many factors that would play into this situation (e.g., job availability, language, type of university education, etc.), but they don't mention specifics about what else was measured and controlled for in the study. I think this demonstrates the problem with "pop" presentations of research and statistics. It doesn't even give enough information to allow people to think critically!
This article suggests that the more frequent use of lighter, low-top basketball shoes (rather than heavier "high-tops") has lead to an increase in ankle injuries in professional basketball. The article does, however, suggest alternate explanations of causality and confounding/mediating variables (e.g., faster game, taller players, more games per season). Indeed, the article mentions previous studies that show that the choice of high- or low-top sneakers does not increase the risk of ankle injuries. This suggests that experimental or quasi-experimental data suggest that sneakers alone don't increase the risk of ankle injuries, but the anecdotal and observational data presented in the article is suggesting that low-top sneakers do increase the frequency of ankle and foot injuries among players. They also provide a "why" for this supposed causal link: the low-top sneakers are providing less support to the foot and ankle. I am not sure why they ignore the evidence from experimental studies in favour of the observational data in this article!!
Counterfeit money in Canada is at the lowest rates since the 1990's. They claim it is because Canadian money is hard to duplicate.
Looking at the graph the provide, however, the more interesting question for me is why was counterfeit money circulation so high during 2002-2006??
If the reason it is so low now is because of better security features on the money, why was it so low in the 1990's? And what happened during the early 2000's?
In honour of Valentine's day - an unromantic view of love:
- Copy of the original study: File:Marital Wealth.pdf
Consulting Reflection: Lately, I have been working with people on Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). This seems to be a troublesome area for researchers (even if they don't know it) and consultants. The most common EFAs and CFAs are analyzing item-level data. There are a whole host of issues with this approach to item-level data, but there are so many alternatives. As a consultant, I often find it difficult to juggle trying to explain the proper way to handle the data, the researchers capabilities and knowledge (most often the problem is that they are unfamiliar with the software than can properly deal with item-level EFA/CFA), and the researcher's own supervisor's preference about how things are done (also influenced by the literature in their area and what is getting published). I also find that these student researchers are told to do measurement models with their item level data, but the scale is easily (and regularly) just summed and used as an observed variable in a simple path analysis or multiple regression. Why are they trying to complexify the model??
A good example of observational data being used for prediction:
A really thoughtful article on the "interaction" between ethics (medical ethics in particular) and statistics. Focuses on the misuse of statistics - not using them properly, or not distinguishing when it is appropriate to use them and when it is not!
Daydreaming - even if you are thinking about something pleasurable - provides less satisfaction than focusing on the present and being in the moment. You are better off focusing on what you are doing in terms of happiness. However, moderate daydreaming during a task, while being aware of your wandering thoughts can promote creativity and productivity.
The researchers discovered a negative relationship between happiness and daydreaming by collecting information on the moods and activities of 2,250 adults via an application on their iPhones. When their phones beeped, they had to describe how they were feeling (on a scale of 1 to 100), what they were doing, and what they were thinking about.
Does anyone see any potential problems with the data collection and research methods used? What about potential confounds? This article implies a causal relationship, however, if all they collected is information about activities, feelings and thoughts, I can only surmise that observational-type data was collected which would allow only correlational conclusions. Perhaps the task at hand caused the unhappiness and then daydreaming was used as a coping mechanism in order to increase pleasurable feelings. If you are interested in something and happy to be doing it, you obviously won't be daydreaming!
A really great popular article on statistics in the news. It covers a lot of the things we talked about in class. For instance, how it is difficult to account for extraneous variables with non-experimental designs, and how this can be misleading when statistics are applied to industry or sports data.
This article talks about the predictability of the earthquake/tsunami magnitude and impact in Japan as it relates to implementing precautionary measures at Fukushima's nuclear power plant. Two questions were of concern (prior to the recent disaster): 1) What was the chance an earthquake-generated wave would hit Fukushima? 2) More pressing, what were the odds it would be larger than the roughly six-metre wall of water the plant had been designed to handle?
Also to consider was the fact that there has only been 4 earthquakes in Japan with magnitude 8 or higher in the last 400 years. A research paper from 2007 concluded that there was a 10% chance, but the Fukushima plant did not implement additional safety measures to guard against a large-scale tsunamis after this study. Assuming this study is correct in its estimates, was the nuclear plant in the wrong not to implement more rigorous safety measures in the slight chance that a massive-scale tsunami hit? Can we really look at the individuals in charge of making decisions about safety measures as somewhat responsible for the situation they find themselves in? I hardly think so. When the odds of something happening are so very slim, and the predictability of natural disasters precarious at best, an a priori decision to strengthen Fukushima's defenses against highly unlikely massive-scale disasters seems quite unreasonable to expect.
Questions and Comments
In regards to Bin's question:
I think you make a great point. The researchers in the 50's didn't think that cigarettes were harmful, and if they obtained the data we were presented in the lecture this week, would they have thought of other factors that might be confounds, mediators, or even to consider reversing the cause/effect? I think the answer to your question puts the onus on the researcher to ensure that have considered all possible explanations, all possible results, and to ensure they have controlled for all relevant factors. I think as a researcher, visually exploring the data is important, otherwise you might never understand the pattern of results. Numbers can be informative, but sometimes they don't give any meaning to the effect of interest. You need to see it!
I think the exercises we worked through this week (i.e., identifying type of data, type of inferences you are trying to make, alternate explanations of results, etc..) was fantastic and a great way to get back to basics and really think about what kind of information your particular sample and your research design can really give you (or your client). So often when people are conducting research and formulated hypotheses, they forget to match the questions they want to answer to the types of statistics they will need to conduct to answer those questions, and then ensure their research design will give them the data that can answer the questions they want to answer! I have heard a common complaint in the halls of the psychology building: there is not enough work done prior to the data collection stage, unless you have to write a grant proposal! This is precisely the kind of forethought that is needed to do great research.
This week, I really had some good insight into how teaching statistics from a particular point of reference (psychology research) really can narrow your perspective on things! Prof. Monette was discussing Scheffe and Bonferroni confidence ellipses, and while I knew Scheffe and Bonferroni, I had never been exposed to confidence ellipses. In psychology, we learn about Scheffe and Bonferroni adjustments in an ANOVA framework in terms of pairwise follow-ups tests to the omnibus F-test. Simply put, we would apply an adjustment for multiplicity control for conducting k-pairwise tests. More specifically, if we were going to conduct 3 pairwise comparisons, we would adjust our alpha value accordingly. So for an alpha (Type I error rate) of .05, the Bonferroni adjustment for 3 tests would be .05/3 = ~.017. Therefore each follow-up test would be tested at the adjusted alpha level of .017, maintaining the overall familywise error rate at .05.
It makes sense that this would be applied to confidence regions and ellipses as well, given that %Confidence Region = 1 - alpha. If alpha is reduced for each test, then the corresponding confidence region would increase. Similarly, I had never been exposed to these adjustments outside of an ANOVA framework, but given that ANOVA is a special case of the GLM, it is not a stretch to discuss these adjustments within a regression framework.
Just a good reminder to broaden your understanding. Things that appear new at first glance are probably just an extension or different perspective of something you've seen before. A new way at looking at the same problems!
Comment on Jessica Li's page
Lately, I have been wondering what the differences are between Simpson's Paradox and Lord's Paradox. I googled my question earlier this week and found the superficial answer that Lord's, Simpson's and Suppression effects are all the same basic thing. So I am really pleased that Prof. Monette discussed these topics, and pointed out that the aren't truly all the same thing, although it might seem so in a general sense. They all do have distinct characteristics that makes them distinguishable. I used this topic as my sample exam question for this week as well, as I think it is important to understand the nuances of each of them, but also to understand why some people may group them as the same phenomenon.
A couple of typos I came across in the HLM notes we are working through:
* Slide #29: characterisitc = characteristic * Slide #86: For school i = For school j
Comment on Andy Li's page
Question: So far we have seen groups with only two levels used in multi-level models (e.g., gender, sector - Catholic versus Public, etc.). How would you model a grouping variable with 3 levels? Would you have to create 2 dummy variables?
General Question: when will our consulting projects be due and what are we required to submit for our class marks? Also, what are the expectations with regard to what we submit to the client?
I am still stuck on my question for "Week 8". I also noticed that someone else has asked this question as well. What do you do when your grouping variable has more than 2 levels? I tried to create 3 dummy variables - group 1 vs. 2, 1 vs. 3, and 2 vs. 3, but got this error:
Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1
So then I tried just having one dummy at a time, but then I got this error:
Error in lme.formula(SA ~ STAX_1vs2 * Condn * time, dat, random = ~1 + : nlminb problem, convergence error code = 1 message = iteration limit reached without convergence (9)
If you define your variable as a factor then R (and the lme function) would know what to do(mainly choosing a reference group for parameter estimates, etc etc) How to define a variable as a factor? dd$myvar <- as.factor(dd$myvar) So no need to dummy code! ~MRE
Is anyone aware of packages that deal with generalized linear mixed models in R aside from "glmm", "pql" and "hglm"?