Avoiding common problems with statistical analysis of biological experiments using a simple nested data simulator
1Center For Theoretical Problems of Physico-Chemical Pharmacology RAS, 109029, Srednyaya Kalitnikovskaya 30, Moscow, Russia
4Dmitriy Rogachev National Medical Research Center of Pediatric Hematology, Oncology Immunology Ministry of Healthcare of Russian Federation, 117997, Samory Machela 1, Moscow Russia
1.Center For Theoretical Problems of Physico-Chemical Pharmacology RAS, 109029, Srednyaya Kalitnikovskaya 30, Moscow, Russia
show the whole list
Despite an extensive literature on statistical methods and their proper application to biological data, incorrect analyses remain a critical and widely spread problem in research papers. Inherently hierarchical (nested, clustered) structure of biological measurements is often erroneously neglected, leading to pseudo-replication and false positive results. This, in turn, complicates the correct assessment of statistical power and impairs optimal planning of experiments. In order to attract more attention to this problem and to illustrate the importance of direct account for the nested structure of biological data, in this article we present a simple open-source simulator of two-level normally distributed stochastic data. By defining ‘true’ mean values and ‘true’ intra- and inter-cluster variances of the simulated data, users of the simulator can test various scenarios, appreciate the importance of using correct multi-level analysis and the danger of neglecting the information about the data structure. Here we apply our nested data simulator to highlight some commonly arising mistakes with data analysis and propose a workflow, in which our simulator could be employed to correctly compare two nested groups of experimental data and to optimally plan new experiments in order to increase statistical power when necessary.