NEWS FLASH!!!
“Vitamin D significantly increases testosterone in men!”
“Vitamin D significantly increases testosterone in healthy overweight men!”
“Vitamin D significantly increases testosterone in healthy overweight men whose testosterone values were at the lower end of the reference range.”
“Vitamin D promotes a small yet statistically significant increase of testosterone in healthy overweight men whose initial testosterone values were at the lower end of the reference range…”
“Vitamin D promotes a small yet statistically significant increase of testosterone, which may or may not be biologically significant, in healthy overweight men whose initial testosterone values were at the lower end of the reference range and whose initial vitamin D levels were in the deficiency range, after a year of high dose supplementation…..”
All of these statements describe the conclusions of the same study regarding vitamin D and testosterone.
They are all true.
Which one do you think you’d see as a headline?
Probably the first one.
But I think the last one is most accurate.
In the study of 54 men, 31 were randomized to receive 3,332 IU of vitamin D daily and 23 a placebo. The final result was that in the vitamin D group total testosterone levels increased from 3.09 ± 1.12 ng/ml to 3.87 ± 1.36 ng/ml; p<0.001. There was no significant increase in the placebo group’s testosterone. (Note that I have converted the units from the original study to match those of the chart below)
So, an increase of 0.78 ng/ml. Is that a lot? The abstract of the study said that it was ‘significant,’ so it is….right?
Before we decide, let’s take note of a few points:
1) The reference range for testosterone for an adult male is about: 2.50 – 9.50 ng/mL
The study’s participants were on the low end of this range.
2) The participants initially had ‘deficient’ levels of vitamin D.
If your vitamin D levels are not deficient, would it increase testosterone?
3) Testosterone levels are constantly fluctuating.
Take a look at this chart (not from the same study):
It shows the hourly mean (average) of total testosterone in young and old men (from Bremer 1983). The 17 young men studied were 23-28 years old and the 12 old men were 58-82. They were ‘normal’ men. What is interesting is that the change of testosterone level in one hour shown in the chart above can be as much as the ‘significant’ change in the vitamin D study after a year.
Taking the above points into consideration, are you still convinced that the results of the vitamin D study are ‘significant?’
I am not convinced. But neither am I convinced otherwise.
So the study is no good?
I am not saying that this is a poorly designed study.
I am not saying that it actually shows vitamin D is useless in raising testosterone.
I am just using it as an demonstration of the concept that statistical significance does not equal practical or biological significance.
And to demonstrate how news articles can sensationalize the results of scientific studies.
In fact, the study’s abstract ends with the statement, “Our results suggest that vitamin D supplementation might increase testosterone levels. Further randomized controlled trials are warranted to confirm this hypothesis.”
You won’t see that in the news headlines.
But the study says that vitamin D produced a ‘significant increase in total testosterone levels!’
Describing, for example, a difference in means as significantly different is indicating that they are different statistically but not different practically or biologically or in any other way. This suggests that there is strong evidence against the null hypothesis that the means are equal. Statistically. Not biologically.
In other words, when we use a statistical test to test the null hypothesis that means are equal, it is a test based on a statistical measure, not a biological or practical measure.
If we reject the null hypothesis, and conclude that the evidence supports the alternative hypothesis that the means are different, we conclude they are statistically different, not biologically different. They still may be biologically or practically different, but that is not determined by this test.
So when the study says there was a ‘significant increase in total testosterone levels’ that is a statistically significant increase, but not necessarily a biologically important increase.
“A Small yet Significant Difference”
Remember the previous post about increasing sample sizes and statistically significant results? That also has a part to play here. As the sample size increases (everything else being held constant) it will become easier to detect differences in means. This can lead to finding a statistically significant result which may be of less practical or biological significance.
What? So I don’t want a large sample size?
No, that’s not what I’m saying. Larger sample sizes are usually preferred when possible. All that is needed is more analysis. As I said in the previous post, p-values are (should be) just a part of the analysis.
“A Large yet Insignificant Difference?”
Yes, that could also occur. In this situation the sample size would probably be too small to detect a statistically significant difference, but it still may be different biologically. More studies with larger sample sizes would have to be performed to determine if this were the case.
Statistical versus Biological and Practical Significance
If you see that the treatment results are statistically significant based upon a p-value and you want to decide whether or not to take the treatment, then you must find out if that means it has practical and/or biological significance.
To determine whether or not something is of practical significance you must do some more analysis because the same result may be practically significant to you and not to someone else. Usually a cost/benefit analysis would help.
For biological significance, an expert in the subject area may be necessary.
Regarding the vitamin D study, I am not a biologist or medical doctor so I do not have the expertise to determine whether or not an increase of 0.78 ng/ml is biologically significant. Maybe it is, maybe it isn’t. Or maybe it is only in men with very low levels in the first place.
Another example: Which Diet to Choose?
To make these concepts clearer, let’s look at an example that is simpler and easy to interpret the issue of statistical versus practical significance. This example is made up, but reflects some studies I’ve read.
Imagine we are interested in studying the best diet for fat loss, a low carbohydrate diet or a low fat diet.
We take 1000 obese people and make 500 of them eat a low carbohydrate diet for one year and we make 500 of them eat a low fat diet for one year. (Don’t imagine all the details of how to randomize and how to make sure people eat the way the are prescribed to, and blinding….just assume it’s a proper study)
After one year we look at the mean difference of fat loss, measured as compared to each group’s initial fat mass.
Low carbohydrate group mean fat loss: 10.5 kilograms
Low fat group mean fat loss: 9.9 kilograms
Difference between the groups is 0.6 kg. Let’s assume that since we have so many people, that is significant at a p-value of < 0.0001. Highly significant!
Of course all the headlines will exclaim: Low carb diets significantly better for fat loss!
But look, after one year there’s a difference of 0.6 kg (1.3 pounds). Is that of practical significance?
Which diet would you choose?
Since the difference after one year is not so large, I would choose whichever diet type I preferred to eat. If after a year I’ve lost 1 pound less, who cares! It was much easier than forcing myself to eat in a manner in which I do not like just to lose 1 more pound in a year.
Now let’s imagine a different, yet similar study:
We take 100 obese people and make 50 of them eat a low carbohydrate diet for one year and we make 50 of them eat a low fat diet for one year.
Low fat group mean fat loss: 12.5 kilograms
Low carb group mean fat loss: 4.9 kilograms
Difference between the groups is 7.6 kg. Let’s assume that, even though we have fewer people, the difference is significant at a p-value of < 0.0001 because the difference is much greater than in the previous imagined study. Highly significant! Just like in the other study.
Same type of headlines: Choose Low Fat Diets for Fat Loss! Significantly better than Low Carb!
Again, it is statistically significant. But I think it is also practically significant. A 7.6 kg or 16.8 pound difference is definitely something to consider when choosing a diet for fat loss.
Which diet would you choose? Is it practically significant to you?
Is the Treatment Worth It?
In the vitamin D study discussed earlier, we saw that vitamin D increased testosterone levels in men with initially low levels of Vitamin D and testosterone. But what did they have to do to get that increase of 0.78 ng/ml?
They had to take a high dose pill of Vitamin D very day for one year.
Is that worth it for the resulting increase in testosterone?
Same idea for the imagined diet studies.
When determining practical significance, always consider the nature of the treatment which achieves the results. Is it worth it to go through the treatment to get the results? Maybe it is to you and not to someone else.
In general, I like to think of different kinds of significance. Statistical significance is just one kind of significance.
In summary:
Biological Significance: Does the treatment produce the desired physiological changes in the body? Probably an expert in the subject area would have to determine this.
Practical Significance: Does the treatment produce the results we desire, and do the benefits of the treatment outweigh the costs? This can be a personal measure of significance, i.e. Is it practically significant to you?
Statistical Significance: the probability of obtaining results at least as extreme given that the null hypothesis is true. Leave that to the statistician.
There are, of course, other kinds of significance like public health significance.
Just remember that a statistically significant result does not necessarily imply that the result is significant in any other way.
Conclusion
A statistical test of the null hypothesis (for example, that means are equal) is a statistical test not a biological or practical test.
Statistical significance does not indicate practical or biological significance or any other kind of significance.
Practical significance should consider the nature of the treatment (cost/benefits) required to achieve the results. It can be a dependent on the individual.
News articles can easily sensationalize the results of studies, and still be…not untruthful…

Excellent summaries!
LikeLike