Introduction

Bar Graphs = Bar : detonator + error bars: firing cable

Bar graphs are common in scientific publications for depicting continuous outcomes. This is problematic, as many different data distributions can lead to the same bar chart. Depicting the data distributions may suggest different conclusions vs. summary statistics. They are in the top ten worst graphs used in science.

Illustration & Alternatives

Today I illustrate this issue using the well-known Anscombe dataset which contains 4 variables that have nearly identical simple descriptive statistics (n, mean, SD), yet appear very different when plotted. It was created by the statistician Francis Anscombe in 1973 to illustrate the importance of graphing data before the statistical analysis and the effect of outliers on summary statistics.

barplots must die
Inadequate vs. Adequate Data Visualisation
Copyright @Statistical.Solutions


Bar graphs are designed for categorical variables; yet they are commonly used to present continuous data in laboratory research, animal studies, and human studies with small sample sizes. Bar charts of continuous data are “visual tables” that typically show the mean and error bars: standard deviation (SD), standard error (SE) or confidence interval (CI) although for inference CI should be reported.

Why Are Dynamite Plots Problematic?

They Hide the Data Distribution.

Many different data distributions can lead to the same bar graph. See plots I created below- raincloud plots clearly exemplify 4 different data distributions even if bar charts are similar. Do all samples cluster closely? Do they form groups? Are there outliers? Generally, we assume a normal distribution of the data around the mean where there might not be one! In my survey of dynamite plots per journal they were more or less normally distributed.

They are Unsuited for Paired Data

Additional problems arise when bar graphs are used to show paired/nonindependent data.

They are Not Intended to Depict Summary Statistics

Summarising the data as mean and SD or SE often causes readers to wrongly infer that the data are normally distributed (without outliers). The problem is exacerbated for small sample size studies (preclinical studies).

They Hide the Sample Size

Unfortunately, way too often we have to search for the n in axis labeling, figure text, the results, or the methods section to finally find this information. There are cases where it is omitted completely. A clear understanding of sample size is critical for the review process of a paper and should be part of the standard pieces of info for reporting scientific findings.

Conclusions

More training in adequate reporting/data presentation is needed by investigators and guidelines/policies should be adjusted in journals.