you're reading...
Ethics, evidence based medicine

Headlines blur the difference between statistically significant and clinically meaningful

Interpreting studies is a dicey thing. Often I find what might be statistically significant translated into headlines that might not really get at the nuance of the study or the results.

Take these three for example:

Pine bark extract improves severe perimenopausal symptoms (Medscape medical news, February 14, 2013)

Two weeks of antibiotic therapy relieves IBS (irritable bowel syndrome) (WebMD)

Study: “Female viagra flibanserin” works (CBS news). The first line of the post: Need a boost to your sex life. The magic could be in a little pill?

Let’s look at the studies referenced by these three headlines:

French maritime bark extract, in the dose studied, improved hot flashes for 35% of women who took the drug versus 29% of women who took placebo. Insomnia and sleep problems improved by 28% for the bark extract versus 21% for placebo. (In the 3 domains tested the bark extract “worked”, but it barely reached statistical significance). Statistically more women who took the drug did better, but is 7 women out of a hundred getting benefit for hot flashes and insomnia from a daily medication a clinically significant or meaningful benefit?

The same goes for the IBS drug, an antibiotic called Rifaximin. In the article a physician tells WebMD that many participants ”say they are 80% improved, 90% improved, that kind of results…” Looking at the studies, TARGET 1 and TARGET 2 we see that 41% in the drug group were responders versus 32% of the placebo group (which is pretty poor overall if you ask me as a known placebo improves the symptoms of IBS for 59% of patients!). Might a few of the responders felt dramatically better? Is that “many”? I guess it depends on your perspective, but understanding that only 9% of people were true responders puts the findings in a  different light.

And flibanserin? Again, the same type of numbers. The initial studies quoted by CBS as “magic” actually showed an flibanserinimprovement for 30-40% of women who took flibanserin versus 15-30% for placebo. Overall, there was an increase in 1-1.8 number of satisfying sexual encounters a month. In my very unscientific study (a poll from a couple of weeks ago) it appears that 78% of people didn’t find those numbers clinically meaningful either.

There is no doubt that each one of these studies has statistical significance. When studying any therapy that is obviously the first step, however, statistical significance is often parlayed into miracle headlines and the promise that a drug is truly helpful is tempered somewhat when you compare it with the placebo response rate.

It behooves everyone not to get caught up in the hype and to put the results in perspective. When a medication helps 65% of people versus 25% for placebo the medical decisions tend to be easier (and by the way, that is more along the lines of what I’d call a miracle response rate, not 40% versus 32%). If no previous therapy has worked for a condition before then 7 responders out of 100 might truly be a miracle. If a patient has not responded to any previous therapy then 7 responders might be worthwhile. If the medication is extremely low-cost and has minimal side effects, then 7 responders out of a 100 might seem reasonable for many people. However, if the medication is expensive and/or has side effects then obviously the enthusiasm needs to be tempered. The response rate to a medication is part of the risk benefit ratio and every patient that will be different and is controlled by variables such as severity of illness, response to previous treatments, impact of the condition on their life, and previous experience with side effects and cost.

Statistical significance doesn’t mean something works for everyone and it doesn’t mean a therapy is a miracle, so reporters and health care providers need to stop intimating that it does. Statistical significance simply means the likelihood that the desired effect was achieved by chance. The next and far more important step is interpreting those results and applying them in a clinically meaningful way, but those kinds of discussions probably don’t generate sexy, amazing headlines.

 

 

About these ads

Discussion

4 thoughts on “Headlines blur the difference between statistically significant and clinically meaningful

  1. Penultimate sentence might be clearer with a ‘not’ included.

    Posted by korhomme | March 20, 2014, 9:12 am
  2. Amen to that Dr. Jen Gunter!!! Why have we become a nation of people who take all this information as true?? Have we become so lazy and complacent as to believe everything we hear? Tigers! Lions! and Bears! OH MY!!!!

    Posted by Antonia | March 20, 2014, 11:50 am
  3. Something you’ve talked about before, but should be repeated here:

    Often, nominal statistical significance _does not mean that something works at all_. There are severe publication biases and misunderstanding of what statistical significance actually means. Most things that have “p < 0.05" are _wrong_ (e.g. Ioannidis 2005, "Why Most Published Research Findings Are False " – http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124 ).

    So, for example, it may not actually be that Rifaximin actually helps 9% of patients (although the statistics on that aren't so bad).

    And, as you've said here, the side effects and cost-benefit analysis are what matters. Rifaximin is quite expensive, right?

    Posted by Michael W Busch | March 21, 2014, 5:00 pm
  4. As always, right to the point that “tests of significance” against a hypothesis of 0 difference are not very powerful statements about anything unless you know the power of the test, sample size, sample characteristics, the reliability and validity of the measurements, and the minimal difference considered to be clinically significant. Ideally one would also know external information like the costs and benefits of treatment a and treatment b in terms of side effects, cost, drug interactions, etc.

    The problem is that the health “headlines” (or teasers) in the newspapers and on the web sites primarily refer to the inane “significance test” and not to the real clinical research issues. Data journalists virtually never report confidence intervals (of parameters or of differences in parameters). I have never seen power discussed as such, although many data journalists and medical researchers seem to confuse sample size with experimental power.

    Possibly part of this problem could be mitigated by better statistical training for data journalists; some might be minimized by having the research authors offer a couple of “headlines” for popular media reports (right in the paper); and researchers should be quick to point out the conflicts of interests inherent in good science reporting versus attention grabbing headlines to increase advertising sales when their own articles are incorrectly cited.

    Thanks for bringing this important issue to the attention to your loyal readers.

    George

    Posted by George Huba | March 24, 2014, 4:27 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Recent Tweets

Follow

Get every new post delivered to your Inbox.

Join 2,088 other followers

%d bloggers like this: