Maxim Vovchenko

Avoiding common problems with statistical analysis of biological experiments using a simple nested data simulator

, , , , , , ,
Despite an extensive literature on statistical methods and their proper application to biological data, incorrect analyses remain a critical and widely spread problem in research papers. Inherently hierarchical (nested, clustered) structure of biological measurements is often erroneously neglected, leading to pseudo-replication and false positive results. This, in turn, complicates the correct assessment of statistical power and impairs optimal planning of experiments. In order to attract more attention to this problem and to illustrate the importance of direct account for the nested structure of biological data, in this article we present a simple open-source simulator of two-level normally distributed stochastic data. By defining ‘true’ mean values and ‘true’ intra- and inter-cluster variances of the simulated data, users of the simulator can test various scenarios, appreciate the importance of using correct multi-level analysis and the danger of neglecting the information about the data structure. Here we apply our nested data simulator to highlight some commonly arising mistakes with data analysis and propose a workflow, in which our simulator could be employed to correctly compare two nested groups of experimental data and to optimally plan new experiments in order to increase statistical power when necessary.
Schematic of a typical biological experiment design, generating nested data
7 257
#nested data#statistical analysis#p-value#false positive#false negative#statistical power#simulated data#intra-cluster correlation