Every now and then when I post about the latest in mindfulness research, I wonder if I am still properly objective. As a fan and (sometimes) practitioner, I may be in the same position as many of my students where I just want to believe the cool results and don’t keep any eye out for flaws an inconsistencies. The good news today is that I have determined that there are still some mindfulness studies that I will toss over my shoulder; the bad news, of course, is that there are such studies out there, and bad studies have the potential to discredit the very ideas they are trying to support.
The claims put forth in the summary of the article were intriguing:
…MBSR participants with higher levels of pretreatment mindfulness showed a larger increase in mindfulness, subjective well-being, empathy, and hope, and larger declines in perceived stress up to 1 year after treatment. (p. 267)
In other words, mindfulness training was particularly helpful for people who entered the 8-week program with more mindful tendencies. I wanted to read the article to find out if the research had suggested a particular level of mindfulness that would benefit, and what theories they had developed to explain why people might need to be mindful already to get the most out of an MBSR course. Instead, I found myself having to set aside all of the claims they made for lack of clear evidence.
The design of the study was great: 30 Californian college students were randomly assigned to either an 8-week MBSR course, or a “waitlist” (basically, they were interested in MBSR but didn’t get to actually do it). Before the MBSR course began, everyone took a battery of online surveys designed to measure their mindfulness, tendency to ruminate on things, perceived stress, sense of well-being, self-compassion, hope, empathy, and willingness to forgive others. When the MBSR course was over, everyone (including the waitlist, who hadn’t been doing anything special in the intervening weeks) took the surveys again, and then again 2 months later, and then again 10 months after that (12 months after the MBSR course ended). If I were doing my own study of mindfulness training, I would aspire to a design just like this.
Sadly, there was no follow-through in the results.
First, the authors checked the familiar claim that the MBSR group would be better off than the waitlist control on a variety of measures. They should be more mindful, of course, as a result of training; and based on assorted previous research, that mindfulness should have led to additional benefits such as less stress and more empathy. This was borne out in a series of analyses that claimed the MBSR participants showed bigger improvements in mindfulness, sense of well-being, and empathy from the start of the study to the 2-month follow up. But the claims were only backed up by p-values (whether the difference is statistically significant) and effect sizes (a measure of how big that difference is), so I went to the actual scores for each group reported in a handy table. And that’s where I ran into trouble, because those means just didn’t seem to measure up to the claims.
The data in the table showed each group’s average score at each time point. To get a feel for how much change there had been, I wanted to see how scores were different at the follow-ups than they were before anyone had even begun the MBSR course or waiting period. The measures all used different scales (some from 0 to 6, some from 1 to 8, and so on), so for ease of comparison I calculated them as a percentage increase: what percentage of the pre-test score had been gained or lost at each follow-up time point. This is the same way that we can look at our salaries from one year to the next and calculate what kind of raise we got. The chart below shows those percentage changes, for only the measures where the authors reported some statistically signficant differences; the small black stars indicate changes the authors identified as being statistically significant.
The improvements in mindfulness and well-being seem quite striking; the difference in the height of the bars (the degree of change) is quite compelling. MBSR did in fact seem to improve levels of mindfulness and well-being, and who wouldn’t want to take a course that would improve their sense of well-being some 25% for at least a year afterward?
Turn to empathy and hope, however, and confidence in the way the findings are being reported plummets. The 2 month improvement in empathy is supposed to be significant (a p < .02, when < .05 is the usual psychology standard) with a reasonable effect size. But on average, the improvement is exactly the same as the waitlist control group. The MBSR students improved from a 2.78 to a 2.82, while the waitlist group improved from a 2.75 to a 2.79. Not only does that difference seem to be ridiculously minuscule – the difference between rating yourself as a 3 out of 5 to a 4 out of 5 on one single question, maybe – and well within any margin of error for how people change their answers one day to the next, but it seems exactly identical for both groups.
The only explanation that comes to mind is that these numbers are just averages of each group at different times, which don’t take into account how each individual student’s scores line up over time. To continue the salary analogy, it’s like figuring out the average raise at the company by comparing the average salary one year to the average salary the next year, when the more accurate way to calculate it would be to figure out what each person’s raise actually was, and then average those. That is presumably where the researchers got their statistics….but it’s hard to be sure, and hard to trust, when we can only look at the averages that they report. I would certainly not take such a tiny prospective increase in hope or empathy as a reason to take an MBSR course, any more than I would start eating broccoli to gain 1 IQ point. It may be statistically significant, but as a practical matter no one would notice.
But there was still some hope, because the focus of the study was actually on whether the MBSR course was most helpful for people who scored higher on mindfulness to begin with. That was what drew me to the article in the first place, and those individual differences could explain the lack of impressive results in other areas. MBSR might improve hope and empathy only for those who are naturally more mindful; the basic ideas of mindfulness might come easily to them, allowing them to quickly move on to more abstract applications that will help them be hopeful and empathetic. When looking at averages, though, these improvements might be masked by the lack of improvement in people who were not naturally mindful – for whom, perhaps, the very idea of mindfulness was new and strange and difficult enough to apply to their own lives, let alone to more complex notions like what other people were feeling.
To look for these individual differences, the researchers conducted some analyses using what’s called a general linear model – which is a regression, which is a more sophisticated version of the correlation that most people know as simply plotting points on the horizontal and vertical axes of a chart and seeing if you get a line with a slope to it. Again, we get a series of p-values and effect sizes in the text, with more specific information in a table (this time correlation values, corresponding to the slope of a line); and again, the statistics and the data in the table don’t seem to line up.
Consider the relationship at 2 months after the MBSR course, when the already-mindful students assigned to take MBSR were supposed to show the greatest improvements in mindfulness, well-being, empathy, and stress (for this one, improvement would be a decrease in how stress people were feeling). A stronger correlation between the pre-intervention mindfulness and each score should show that the more mindful you were, the higher your eventual well-being, or lower your perceived stress, turned out to be. We expect the correlations to be stronger for students who took the MBSR course, showing that the combination of MBSR and pre-intervention mindfulness was needed for the improvements; there might still be correlations for the waitlist group, but not as dramatic because the mindful among them weren’t shown how to take advantage of those mindful tendencies.
And here are the correlationsat the 2-month follow-up are plotted, again with black stars for data the researchers highlighted as being statistically significant.
The correlations support the main claims for improved mindfulness, well-being, and reduced stress, with very nice dramatic differences. But what’s going on with empathy? Sure, on average people who were more mindful to begin with seem also to be more empathetic at 2 months…but that’s true regardless of whether they did MBSR or sat on the waitlist. If anything, it’s the waitlist mindful who are getting the most out of their mindfulness, exactly the opposite of what is claimed in the text.
Head out to the 12-month follow-up, and the same pattern applies. Only now it looks worse, because it’s not just empathy but also hope that has been claimed to be statistically significant, in contradiction to what the correlations seem to show.
What should we make of these discrepancies?
In the best case scenario, the claims made are legitimate based on the actual statistical analyses run, but the authors didn’t provide the right information to back up their claims; they skimped on the actual reports of their regressions, and either didn’t notice or didn’t think anyone would look at the data they were actually providing and wonder why they didn’t line up. But even in that best case, questions are raised. If the authors are so unfamiliar with this kind of analysis that they didn’t have a better way to tell us the actual results (often, we provide tables just to show the each step of the regression and what that tells us), should we trust that the analyses were run correctly in the first place? If the authors were so blinded by the statistical significance of their results that they didn’t think about how those tiny average improvements or backwards correlations would be perceived by a reader, does that mean a bias that only wants to show the great potential of mindfulness without an awareness of whether those statistics mean anything in the real world?
In the end, I simply can’t trust the data; I can’t see for myself the connections between data and statistical analyses, and I’ve been trained to think too critically to just accept someone else’s claim that something is or is not statistically significant as the final word.
I wanted to talk about the implications of the conclusions the authors actually drew: Whether this suggests caution in recommending MBSR, because not everyone will reap the same benefits, or whether one must cross a certain threshold of mindfulness before some benefits are seen. I still think there are important implications to this idea; for example, it emphasizes the point of only trusting studies that randomly assign people to MBSR, because those who are drawn to it might be those who are more mindful to begin with, and going to get the most out of it. Instead, I can only hope that someone sees this as inspiration to do their own study, with much more careful and communicated statistical analysis. I do believe that different people will take different things out of mindfulness; but I cannot claim any decent support for that belief from this study.
Shapiro SL, Brown KW, Thoresen C, & Plante TG (2011). The moderation of mindfulness-based stress reduction effects by trait mindfulness: Results from a randomized controlled trial. Journal of Clinical psychology, 67 (3), 267-277 PMID: 21254055