How To Spot Bad Science
Blog: FS - Smart decisions
In a digital world that clamors for clicks, news is sensationalized and “facts” change all the time. Here’s how to discern what is trustworthy and what is hogwash.
Unless you’ve studied it, most of us are never taught how to evaluate science or how to parse the good from the bad. Yet it is something that dictates every area of our lives. It is vital for helping us understand how the world works. It might be too much effort and time to appraise research for yourself, however. Often, it can be enough to consult an expert or read a trustworthy source.
But some decisions require us to understand the underlying science. There is no way around it. Many of us hear about scientific developments from news articles and blog posts. Some sources put the work into presenting useful information. Others manipulate or misinterpret results to get more clicks. So we need the thinking tools necessary to know what to listen to and what to ignore. When it comes to important decisions, like knowing what individual action to take to minimize your contribution to climate change or whether to believe the friend who cautions against vaccinating your kids, being able to assess the evidence is vital.
Much of the growing (and concerning) mistrust of scientific authority is based on a misunderstanding of how it works and a lack of awareness of how to evaluate its quality. Science is not some big immovable mass. It is not infallible. It does not pretend to be able to explain everything or to know everything. Furthermore, there is no such thing as “alternative” science. Science does involve mistakes. But we have yet to find a system of inquiry capable of achieving what it does: move us closer and closer to truths that improve our lives and understanding of the universe.
“Rather than love, than money, than fame, give me truth.”
— Henry David Thoreau
There is a difference between bad science and pseudoscience. Bad science is a flawed version of good science, with the potential for improvement. It follows the scientific method, only with errors or biases. Often, it’s produced with the best of intentions, just by researchers who are responding to skewed incentives.
Pseudoscience has no basis in the scientific method. It does not attempt to follow standard procedures for gathering evidence. The claims involved may be impossible to disprove. Pseudoscience focuses on finding evidence to confirm it, disregarding disconfirmation. Practitioners invent narratives to preemptively ignore any actual science contradicting their views. It may adopt the appearance of actual science to look more persuasive.
While the tools and pointers in this post are geared towards identifying bad science, they will also help with easily spotting pseudoscience.
Good science is science that adheres to the scientific method, a systematic method of inquiry involving making a hypothesis based on existing knowledge, gathering evidence to test if it is correct, then either disproving or building support for the hypothesis. It takes many repetitions of applying this method to build reasonable support for a hypothesis.
In order for a hypothesis to count as such, there must be evidence that, if collected, would disprove it.
In this post, we’ll talk you through two examples of bad science to point out some of the common red flags. Then we’ll look at some of the hallmarks of good science you can use to sort the signal from the noise. We’ll focus on the type of research you’re likely to encounter on a regular basis, including medicine and psychology, rather than areas less likely to be relevant to your everyday life.
[Note: we will use the terms “research” and “science” and “researcher” and “scientist” interchangeably here.]
“The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom.” ―Isaac Asimov
First, here’s an example of flawed science from psychology: power posing. A 2010 study by Dana Carney, Andy J. Yap, and Amy Cuddy entitled “Power Posing: Brief Nonverbal Displays Effects Neuroendocrine Levels and Risk Tolerance” claimed “open, expansive” poses caused participants to experience elevated testosterone levels, reduced cortisol levels, and greater risk tolerance. These are all excellent things in a high-pressure situation, like a job interview. The abstract concluded that “a person can, via a simple two-minute pose, embody power and instantly become more powerful.” The idea took off. It spawned hundreds of articles, videos, and tweets espousing the benefits of including a two-minute power pose in your day.
Yet at least eleven follow up studies, many led by Joseph Cesario of Michigan State University including “’Power Poses’ Don’t Work, Eleven New Studies Suggest,” failed to replicate the results. None found that power posing has a measurable impact on people’s performance in tasks or on their physiology. While subjects did report a subjective feeling of increased powerfulness, their performance did not differ from subjects who did not strike a power pose.
One of the researchers of the original study, Carney, has since changed her mind about the effect. Carney stated she no longer believe the results of the original study. Unfortunately, this isn’t always how researchers respond when confronted with evidence discrediting their prior work. We all know how uncomfortable changing our minds is.
The notion of power posing is exactly the kind of nugget that spreads fast online. It’s simple, free, promises dramatic benefits with minimal effort, and is intuitive. We all know posture is important. It has a catchy, memorable name. Yet examining the details of the original study reveals a whole parade of red flags. The study had 42 participants. That might be reasonable for preliminary or pilot studies. But is in no way sufficient to “prove” anything. It was not blinded. Feedback from participants was self-reported, which is notorious for being biased and inaccurate.
There is also a clear correlation/causation issue. Powerful, dominant animals tend to use expansive body language that exaggerates their size. Humans often do the same. But that doesn’t mean it’s the pose making them powerful. Being powerful could make them pose that way.
A TED Talk in which Amy Cuddy, the study’s co-author, claimed power posing could “significantly change the way your life unfolds” is one of the most popular to date, with tens of millions of views. The presentation of the science in the talk is also suspect. Cuddy makes strong claims with a single, small study as justification. She portrays power posing as a panacea. Likewise, the original study’s claim that a power pose makes someone “instantly become more powerful” is suspiciously strong.
This is one of the examples of psychological studies related to small tweaks in our behavior that have not stood up to scrutiny. We’re not singling out the power pose study as being unusually flawed or in any way fraudulent. The researchers had clear good intentions and a sincere belief in their work. It’s a strong example of why we should go straight to the source if we want to understand research. Coverage elsewhere is unlikely to even mention methodological details or acknowledge any shortcomings. It would ruin the story. We even covered power posing on Farnam Street in 2016—we’re all susceptible to taking these ‘scientific’ results seriously, without checking on the validity of the underlying science.
It is a good idea to be skeptical of research promising anything too dramatic or extreme with minimal effort, especially without substantial evidence. If it seems too good to be true, it most likely is.
Green Coffee Beans
“An expert is a person who has made all the mistakes that can be made in a very narrow field.” ―Niels Bohr
The world of weight-loss science is one where bad science is rampant. We all know, deep down, that we cannot circumnavigate the need for healthy eating and exercise. Yet the search for a magic bullet, offering results without effort or risks, continues. Let’s take a look at one study that is a masterclass in bad science.
Entitled “Randomized, Double-Blind, Placebo-Controlled, Linear Dose, Crossover Study to Evaluate the Efficacy and Safety of a Green Coffee Bean Extract in Overweight Subjects,” it was published in 2012 in the journal Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy. On the face of it, and to the untrained eye, the study may appear legitimate, but it is rife with serious problems, as Scott Gavura explained in the article “Dr. Oz and Green Coffee Beans – More Weight Loss Pseudoscience” in the publication Science-Based Medicine. The original paper was later retracted by its authors. The Federal Trade Commission (FTC) ordered the supplement manufacturer who funded the study to pay a $3.5 million fine for using it in their marketing materials, describing it as “botched.”
The Food and Drug Administration (FDA) recommends studies relating to weight-loss consist of at least 3,000 participants receiving the active medication and at least 1,500 receiving a placebo, all for a minimum period of 12 months. This study used a mere 16 subjects, with no clear selection criteria or explanation. None of the researchers involved had medical experience or had published related research. They did not disclose the conflict of interest inherent in the funding source. It didn’t cover efforts to avoid any confounding factors. It is vague about whether subjects changed their diet and exercise, showing inconsistencies. The study was not double-blinded, despite claiming to be. It has not been replicated.
The FTC reported that the study’s lead investigator “repeatedly altered the weights and other key measurements of the subjects, changed the length of the trial, and misstated which subjects were taking the placebo or GCA during the trial.” A meta-analysis by Rachel Buchanan and Robert D. Beckett, “Green Coffee for Pharmacological Weight Loss” published in the Journal of Evidence-Based Complementary & Alternative Medicine, failed to find evidence for green coffee beans being safe or effective; all the available studies had serious methodological flaws, and most did not comply with FDA guidelines.
Signs of Good Science
“That which can be asserted without evidence can be dismissed without evidence.” ―Christopher Hitchens
We’ve inverted the problem and considered some of the signs of bad science. Now let’s look at some of the indicators a study is likely to be trustworthy. Unfortunately, there is no single sign a piece of research is good science. None of the signs mentioned here are, alone, in any way conclusive. There are caveats and exceptions to all. These are simply factors to evaluate.
It’s Published by a Reputable Journal
“The discovery of instances which confirm a theory means very little if we have not tried, and failed, to discover refutations.” —Karl Popper
A journal, any journal, publishing a study says little about its quality. Some will publish any research they receive in return for a fee. A few so-called “vanity publishers” claim to have a peer-review process, yet they typically have a short gap between receiving a paper and publishing it. We’re talking days or weeks, not the expected months or years. Many predatory publishers do not even make any attempt to verify quality.
No journal is perfect. Even the most respected journals make mistakes and publish low-quality work sometimes. However, anything that is not published research or based on published research in a journal is not worth consideration. Not as science. A blog post saying green smoothies cured someone’s eczema is not comparable to a published study. The barrier is too low. If someone cared enough about using a hypothesis or “finding” to improve the world and educate others, they would make the effort to get it published. The system may be imperfect, but reputable researchers will generally make the effort to play within it to get their work noticed and respected.
It’s Peer Reviewed
Peer review is a standard process in academic publishing. It’s intended as an objective means of assessing the quality and accuracy of new research. Uninvolved researchers with relevant experience evaluate papers before publication. They consider factors like how well it builds upon pre-existing research or if the results are statistically significant. Peer review should be double-blinded. This means the researcher doesn’t know who is reviewing their work and the reviewer doesn’t know who the researcher is.
Publishers only perform a cursory “desk check” before moving onto peer review. This is to check for major errors, nothing more. They cannot have the expertise necessary to vet the quality of every paper they handle—hence the need for external experts. The number of reviewers and strictness of the process depends on the journal. Reviewers either declare a paper unpublishable or suggest improvements. It is rare for them to suggest publishing without modifications.
Sometimes several rounds of modifications prove necessary. It can take years for a paper to see the light of day, which is no doubt frustrating for the researcher. But it ensures no or fewer mistakes or weak areas.
Pseudoscientific practitioners will often claim they cannot get their work published because peer reviewers suppress anything contradicting prevailing doctrines. Good researchers know having their work challenged and argued against is positive. It makes them stronger. They don’t shy away from it.
Peer review is not a perfect system. Seeing as it involves humans, there is always room for bias and manipulation. In a small field, it may be easy for a reviewer to get past the double-blinding. However, as it stands, peer review seems to be the best available system. In isolation, it’s not a guarantee that research is perfect, but it’s one factor to consider.
The Researchers Have Relevant Experience and Qualifications
One of the red flags in the green coffee bean study was that the researchers involved had no medical background or experience publishing obesity-related research.
While outsiders can sometimes make important advances, researchers should have relevant qualifications and a history of working in that field. It is too difficult to make scientific advancements without the necessary background knowledge and expertise. If someone cares enough about advancing a given field, they will study it. If it’s important, verify their backgrounds.
It’s Part of a Larger Body of Work
“Science, my lad, is made up of mistakes, but they are mistakes which it is useful to make, because they lead little by little to the truth.” ―Jules Verne
We all like to stand behind the maverick. But we should be cautious of doing so when it comes to evaluating the quality of science. On the whole, science does not progress in great leaps. It moves along millimeter by millimeter, gaining evidence in increments. Even if a piece of research is presented as groundbreaking, it has years of work behind it.
Researchers do not work in isolation. Good science is rarely, if ever, the result of one person or even one organization. It comes from a monumental collective effort. So when evaluating research, it is important to see if other studies point to similar results and if it is an established field of work. For this reason, meta-analyses, which analyze the combined results of many studies on the same topic, are often far more useful to the public than individual studies. Scientists are humans and they all make mistakes. Looking at a collective body of work helps smooth out any problems. Individual studies are valuable in that they further the field as a whole, allowing for the creation of meta-studies.
Science is about evidence, not reputation. Sometimes well-respected researchers, for whatever reason, produce bad science. Sometimes outsiders produce amazing science. What matters is the evidence they have to support it. While an established researcher may have an easier time getting support for their work, the overall community accepts work on merit. When we look to examples of unknowns who made extraordinary discoveries out of the blue, they always had extraordinary evidence for it.
Questioning the existing body of research is not inherently bad science or pseudoscience. Doing so without a remarkable amount of evidence is.
It Doesn’t Promise a Panacea or Miraculous Cure
Studies that promise anything a bit too amazing can be suspect. This is more common in media reporting of science or in research used for advertising.
In medicine, a panacea is something that can supposedly solve all, or many, health problems. These claims are rarely substantiated by anything even resembling evidence. The more outlandish the claim, the less likely it is to be true. Occam’s razor teaches us that the simplest explanation with the fewest inherent assumptions is most likely to be true. This is a useful heuristic for evaluating potential magic bullets.
It Avoids or at Least Discloses Potential Conflicts of Interest
A conflict of interest is anything that incentivizes producing a particular result. It distorts the pursuit of truth. A government study into the health risks of recreational drug use will be biased towards finding evidence of negative risks. A study of the benefits of breakfast cereal funded by a cereal company will be biased towards finding plenty of benefits. Researchers do have to get funding from somewhere, so this does not automatically make a study bad science. But research without conflicts of interest is more likely to be good science.
High-quality journals require researchers to disclose any potential conflicts of interest. But not all journals do. Media coverage of research may not mention this (another reason to go straight to the source). And people do sometimes lie. We don’t always know how unconscious biases influence us.
It Doesn’t Claim to Prove Anything Based on a Single Study
In the vast majority of cases, a single study is a starting point, not proof of anything. The results could be random chance, or the result of bias, or even outright fraud. Only once other researchers replicate the results can we consider a study persuasive. The more replications, the more reliable the results are. If attempts at replication fail, this can be a sign the original research was biased or incorrect.
A note on anecdotes: they’re not science. Anecdotes, especially from people close to us or those who have a lot of letters behind their name, have a disproportionate clout. But hearing something from one person, no matter how persuasive, should not be enough to discredit published research.
Science is about evidence, not proof. And evidence can always be discredited.
It Uses a Reasonable, Representative Sample Size
A representative sample represents the wider population, not one segment of it. If it does not, then the results may only be relevant for people in that demographic, not everyone. Bad science will often also use very small sample sizes.
There is no set target for what makes a large enough sample size; it all depends on the nature of the research. In general, the larger, the better. The exception is in studies that may put subjects at risk, which use the smallest possible sample to achieve usable results.
In areas like nutrition and medicine, it’s also important for a study to last a long time. A study looking at the impact of a supplement on blood pressure over a week is far less useful than one over a decade. Long-term data smooths out fluctuations and offers a more comprehensive picture.
The Results Are Statistically Significant
Statistical significance refers to the likelihood, measured in a percentage, that the results of a study were not due to pure random chance. The threshold for statistical significance varies between fields. Check if the confidence interval is in the accepted range. If it’s not, it’s not worth paying attention to.
It Is Well Presented and Formatted
“When my information changes, I alter my conclusions. What do you do, sir?” ―John Maynard Keynes
As basic as it sounds, we can expect good science to be well presented and carefully formatted, without prominent typos or sloppy graphics.
It’s not that bad presentation makes something bad science. It’s more the case that researchers producing good science have an incentive to make it look good. As Michael J. I. Brown of Monash University explains in How to Quickly Spot Dodgy Science, this is far more than a matter of aesthetics. The way a paper looks can be a useful heuristic for assessing its quality. Researchers who are dedicated to producing good science can spend years on a study, fretting over its results and investing in gaining support from the scientific community. This means they are less likely to present work looking bad. Brown gives an example of looking at an astrophysics paper and seeing blurry graphs and misplaced image captions—then finding more serious methodological issues upon closer examination. In addition to other factors, sloppy formatting can sometimes be a red flag. At the minimum, a thorough peer-review process should eliminate glaring errors.
It Uses Control Groups and Double-Blinding
A control group serves as a point of comparison in a study. The control group should be people as similar as possible to the experimental group, except they’re not subject to whatever is being tested. The control group may also receive a placebo to see how the outcome compares.
Blinding refers to the practice of obscuring which group participants are in. For a single-blind experiment, the participants do not know if they are in the control or the experimental group. In a double-blind experiment, neither the participants nor the researchers know. This is the gold standard and is essential for trustworthy results in many types of research. If people know which group they are in, the results are not trustworthy. If researchers know, they may (unintentionally or not) nudge participants towards the outcomes they want or expect. So a double-blind study with a control group is far more likely to be good science than one without.
It Doesn’t Confuse Correlation and Causation
In the simplest terms, two things are correlated if they happen at the same time. Causation is when one thing causes another thing to happen. For example, one large-scale study entitled “Are Non-Smokers Smarter than Smokers?” found that people who smoke tobacco tend to have lower IQs than those who don’t. Does this mean smoking lowers your IQ? It might, but there is also a strong link between socio-economic status and smoking. People of low income are, on average, likely to have lower IQ than those with higher incomes due to factors like worse nutrition, less access to education, and sleep deprivation. A study by the Centers for Disease Control and Prevention entitled “Cigarette Smoking and Tobacco Use Among People of Low Socioeconomic Status,” people of low socio-economic status are also more likely to smoke and to do so from a young age. There might be a correlation between smoking and IQ, but that doesn’t mean causation.
Disentangling correlation and causation can be difficult, but good science will take this into account and may detail potential confounding factors of efforts made to avoid them.
“The scientist is not a person who gives the right answers, he’s one who asks the right questions.” ―Claude Lévi-Strauss
The points raised in this article are all aimed at the linchpin of the scientific method—we cannot necessarily prove anything; we must consider the most likely outcome given the information we have. Bad science is generated by those who are willfully ignorant or are so focused on trying to “prove” their hypotheses that they fudge results and cherry-pick to shape their data to their biases. The problem with this approach is that it transforms what could be empirical and scientific into something subjective and ideological.
When we look to disprove what we know, we are able to approach the world with a more flexible way of thinking. If we are unable to defend what we know with reproducible evidence, we may need to reconsider our ideas and adjust our worldviews accordingly. Only then can we properly learn and begin to make forward steps. Through this lens, bad science and pseudoscience are simply the intellectual equivalent of treading water, or even sinking.
- Most of us are never taught how to evaluate science or how to parse the good from the bad. Yet it is something that dictates every area of our lives.
- Bad science is a flawed version of good science, with the potential for improvement. It follows the scientific method, only with errors or biases.
- Pseudoscience has no basis in the scientific method. It does not attempt to follow standard procedures for gathering evidence. The claims involved may be impossible to disprove.
- Good science is science that adheres to the scientific method, a systematic method of inquiry involving making a hypothesis based on existing knowledge, gathering evidence to test if it is correct, then either disproving or building support for the hypothesis.
- Science is about evidence, not proof. And evidence can always be discredited.
- In science, if it seems too good to be true, it most likely is.
Signs of good science include:
- It’s Published by a Reputable Journal
- It’s Peer Reviewed
- The Researchers Have Relevant Experience and Qualifications
- It’s Part of a Larger Body of Work
- It Doesn’t Promise a Panacea or Miraculous Cure
- It Avoids or at Least Discloses Potential Conflicts of Interest
- It Doesn’t Claim to Prove Anything Based on a Single Study
- It Uses a Reasonable, Representative Sample Size
- The Results Are Statistically Significant
- It Is Well Presented and Formatted
- It Uses Control Groups and Double-Blinding
- It Doesn’t Confuse Correlation and Causation