Evaluating Scientific Studies

One thing we all have to do as skeptics is see what other people are saying or studying and evaluate it to see if it stands up. No one person can do all the research needed in even just one subject. Lots of scientists and people need to contribute to science in many different ways. There's a system set up where studies are peer reviewed. Science involves lots of people, so it's not perfect but it is self-correcting and it's by far the best way to go about understanding and learning about the universe.

Never rely on one scientist. That would be an appeal to authority. It's not to say you can't look to a scientist and value his work, but it should be peer reviewed and replicated by others as well. Scientists really do need to stand on the shoulders of giants, and to have other scientists stand next to them.

The other day I was listening to an older episode of The Skeptic's Guide to the Universe: Episode 123 and Dr. Steven Novella was talking about this subject, regarding fringe science. It was interesting enough that I transcribed part of the episode to share with you. You can extrapolate this information when reading about studies done especially in the paranormal/fringe areas.

First he was talking about how scientists have to study scientific literature. As skeptics we also have to at least have a grasp on how things work. Also he explains where skeptics come in. Here's what he said:

39:35 You have to develop the ability and the skill to interpret the literature, even if you're not doing research in that area. What scientific skeptics are trying to do is provide the kind of peer review and critical analysis that typically happens in mainstream science, and apply that to more of these fringe areas because mainstream scientists are ignoring it, out of hand, usually.

A few minutes later, Dr. Novella was talking about science that claimed to verify the supernatural (psychic dogs and other phenomena. Listen to the whole episode for the complete story).
44:47 In order for science to be compelling enough to establish a new phenomenon in science, we need to see a few things, all at the same time:

  1. Science that has good methodology, where any artifacts are weeded out.

  2. Results that are statistically significant.

  3. Replication, so we know it's not just one lab or one scientist.

  4. An effect size that is above noise.

That doesn't even include a mechanism which would be the icing on the cake.
So let's say we don’t understand enough to know what the mechanism would be. Let's just decide if it exists as a phenomenon, then we can worry later about the mechanism.
The fact that we lack these things in those areas for which there is no plausible mechanism, I think is not a coincidence. But let's just look at those four criteria.
So what we'll get from people like Alex* is, "well look, this study has good statistical significance, and this study has good methodology", and maybe you have two or even three of those things, but never all four at the same time.
You have studies which show a large effect size but then as you improve the methods the effect sizes shrink. Or you have small studies that have no statistical significance.
(explanation of Dean Rayden's metastudy)

46:56 So what Alex doesn’t understand is that we're not compelled by tiny effect sizes because to be compelled by it, the unstated major premise is that perfect methodology should produce an effect size of zero. Meaning that there's no noise in the system. Meaning that we're able to conduct trials with people and get everything perfect. And that's just not the case, that's not what we see in science.
When you get down to single digit effect sizes we assume that's the noise in the system. That that's a null effect. Negative.
So I look at that data and I see as the methodology improved, the effect size shrank, and when you pool everything together, you get down to an effect size of about 4 or 5%. (more on Dean Rayden's analysis). That's negative.

48:08 What would the number have to be in order for the light to go on?
It depends on what you're doing research in, but if you're dealing with people anywhere in the system, we like to see effect sizes of, to say that this is compelling, you like to see effect sizes of 30% or so. To say that it's interesting, this is something that we need to look at, I would like to see at least a 20% effect size. And then if you have some really objective outcome measures and minimal potential for unforeseen biases in the way the data is being collected, maybe like 10-20%, but that's borderline. That's kind of small, I'm not really convinced by it, but maybe there's something there. Less than 10% doesn't even deserve a pass. Single digit is just noise.

49:52 I do research, and I realize how easy it is to throw a little bias into the numbers. Just to give you an example, I'm not saying that any of this actually happened, but just to give you an example of a really easy way in which subtle bias can creep into this kind of study.
Let's say you run a series and the results are not looking good and then you think to yourself, "huh, did I calibrate my equipment properly? Let's start over and calibrate the equipment and go forward and just not count this trial." How do I know that kind of stuff is not happening in any of these studies? That's not necessarily even conscious fraud, it may be completely legitimate to think you've got to, whatever, run some controls before. But you might be more likely to think to do that if you just happen to be on a negative streak than if you had a few positive hits in there, maybe you would not think or do anything to sacrifice those. This is just a hypothetical example, but there's a hundred things like that where really subtle bias can creep in to studies like this. So you just can't believe effect sizes that are that tiny.

*Alex Tsakiris is from the Skeptico Podcast (he's pro-pseudoscience, even though he calls himself a skeptic)


  1. That is the reason one or two stats courses are included in university degree programs. It probably also accounts, to some extent, for why there are more nontheists among those with more education: they are very much aware of how easy it is for bias to enter into any one study, especially if it an area where it is difficult to do double blind studies, i.e., paranormal & supernatural phenomena.

  2. Thanks Angie, excellent points. :)