# Avoiding common problems with statistical analysis of biological experiments using a simple nested data simulator

^{1,4,2}

## Introduction

##### Pseudoreplication and the Design of Ecological Field Experiments

S. H. Hurlbert

*Ecological Monographs*. **1984**, 54 (2), 187-211

##### The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?

S. E. Lazic

*BMC Neuroscience*. **2010**, 11, 5

##### The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?

S. E. Lazic

*BMC Neuroscience*. **2010**, 11, 5

##### What exactly is “N” in cell culture and animal experiments?

S. E. Lazic, C. J. Clarke-Williams, and M. R. Munafò

*PLoS Biology*. **2018**, 16 (4), NaN

##### The effect of clustering on statistical tests: an illustration using classroom environment data

J. P. Dorman

*Educational Psychology*. **2008**, 28 (5), 583-595

##### SuperPlots: Communicating reproducibility and variability in cell biology.

Lord SJ, Velle KB, Mullins RD, Fritz-Laylin LK.

*Journal of Cell Biology*. **2020**, 219 (6), NaN

##### Pseudoreplication and the Design of Ecological Field Experiments

S. H. Hurlbert

*Ecological Monographs*. **1984**, 54 (2), 187-211

##### What exactly is “N” in cell culture and animal experiments?

S. E. Lazic, C. J. Clarke-Williams, and M. R. Munafò

*PLoS Biology*. **2018**, 16 (4), NaN

##### Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

E. Aarts, C. V. Dolan, M. Verhage, and S. van der Sluis

*BMC Neuroscience*. **2015**, 16, NaN

##### Using InVivoStat to perform the statistical analysis of experiments

S. T. Bate, R. A. Clark, and S. C. Stanford

*Journal of Psychopharmacology*. **2017**, 31 (6), 644-652

##### SuperPlots: Communicating reproducibility and variability in cell biology.

Lord SJ, Velle KB, Mullins RD, Fritz-Laylin LK.

*Journal of Cell Biology*. **2020**, 219 (6), NaN

##### Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

E. Aarts, C. V. Dolan, M. Verhage, and S. van der Sluis

*BMC Neuroscience*. **2015**, 16, NaN

## Methods

### Generation of stochastic nested data

*x*) and ‘experiment’ (

*y*), each containing

*N*data clusters with

*n*observations per cluster. Random variables, representing ‘control’ and ‘experimental’ values are generated as follows:

*i*is the cluster number from 1 to

*N*;

*j*is the number of the observation within the cluster from 1 to

*n*; \(x_{mean}^{intra}\ (i)\) is the mean value of ‘control’ observations in the

*i*-th cluster; \(y_{mean}^{intra}\ (i)\) is the mean value of ‘experimental’ observations in the

*i*-th cluster; \( \sigma_{intra}^2 \) is the intra-cluster variance; \(R(0,1)\) is a random normally distributed number with a mean of 0 and a standard deviation of 1.

*R(0,1)*is a random normally distributed number with a mean of 0 and a standard deviation of 1.

### Statistical comparison of ‘Control’ and ‘Experimental’ groups

*Method 1*:

*n*observations from all

*N*clusters are combined together to form two aggregated pools of values: ‘control’ and ‘experiment’. The aggregated pools are then compared using a standard unpaired two-tailed t-test, based on the score

*t*:

_{1}*Method 2*: observations in each cluster are averaged. Mean values from each cluster are used as inputs for a standard unpaired t-test, based on the score

*t*:

_{2}*Method 3*: the nested structure of the datasets is taken into account by computing an adjusted unpaired two-tailed t-test statistics

##### Correcting a Significance Test for Clustering

L. V. Hedges

*Journal of Educational and Behavioral Statistics*. **2007**, 32 (2), 151-179

### Statistical power analysis

### Code availability

## Results

**Do not pool your data if they are intra-cluster-correlated**

##### Correcting a Significance Test for Clustering

L. V. Hedges

*Journal of Educational and Behavioral Statistics*. **2007**, 32 (2), 151-179

**How to properly plan your experiment and adjust data processing for detecting a true difference between compared groups?**

##### SuperPlots: Communicating reproducibility and variability in cell biology.

Lord SJ, Velle KB, Mullins RD, Fritz-Laylin LK.

*Journal of Cell Biology*. **2020**, 219 (6), NaN

##### Using Effect Size–or Why the P Value Is Not Enough

G. M. Sullivan and R. Feinn

*Journal of Graduate Medical Education*. **2012**, 4 (3), 279-282

**How to process nested data properly?**

*N*= 3 clusters,

*n*= 10-20 observations per cluster). With these data in hand, one can estimate the mean values, the variances in each cluster and the variances of the means between clusters.

*U*-test, can be applied

##### Use of the Mann–Whitney U-test for clustered data

B. Rosner and D. Grove

*Statistics in Medicine*. **1999**, 18 (11), 1387-1400

https://doi.org/10.1002/(SICI)1097-0258(19990615)18:11<1387::AID-SIM126>3.0.CO;2-V

*α*, the ‘experiment’ is statistically different from the ‘control’ with the level of significance 1-

*α*.

*N*) (i.e. animals/cells/patients/experimental days) or how many observations per cluster (

*n*) one should add to increase the power to the required level. This can be done by simply running the nested data simulator with the means and variances estimated from the pilot experiment, but increasing the number of clusters and observations within clusters until one gets a satisfactory outcome. After a good combination is found, the additional measurements should be carried out and the data added to the datasets. The new means and variances should then be estimated in ‘control’ and ‘experimental’ datasets and the cycle of the workflow repeated (Fig. 4). This workflow is expected to maximize efficiency, high statistical power and ensure a proper estimation of the significance of the difference of the means in ‘control’ and ‘experimental’ conditions.

## Conclusion

## Acknowledgements

**Author contributions**

**Conflict of interest**

### References of this article:

##### Pseudoreplication and the Design of Ecological Field Experiments

S. H. Hurlbert

*Ecological Monographs*.**1984**, 54 (2), 187-211##### The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?

S. E. Lazic

*BMC Neuroscience*.**2010**, 11, 5##### What exactly is “N” in cell culture and animal experiments?

S. E. Lazic, C. J. Clarke-Williams, and M. R. Munafò

*PLoS Biology*.**2018**, 16 (4), NaN##### The effect of clustering on statistical tests: an illustration using classroom environment data

J. P. Dorman

*Educational Psychology*.**2008**, 28 (5), 583-595##### SuperPlots: Communicating reproducibility and variability in cell biology.

Lord SJ, Velle KB, Mullins RD, Fritz-Laylin LK.

*Journal of Cell Biology*.**2020**, 219 (6), NaN##### Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

E. Aarts, C. V. Dolan, M. Verhage, and S. van der Sluis

*BMC Neuroscience*.**2015**, 16, NaNhttps://www.graphpad.com/scientific-software/prism/

. , ,

##### Using InVivoStat to perform the statistical analysis of experiments

S. T. Bate, R. A. Clark, and S. C. Stanford

*Journal of Psychopharmacology*.**2017**, 31 (6), 644-652##### Selection of the experimental unit in teratology studies

J. K. Haseman and M. D. Hogan

*Teratology*.**1975**, 12 (2), 165-171##### Correcting a Significance Test for Clustering

L. V. Hedges

*Journal of Educational and Behavioral Statistics*.**2007**, 32 (2), 151-179##### Using Effect Size–or Why the P Value Is Not Enough

G. M. Sullivan and R. Feinn

*Journal of Graduate Medical Education*.**2012**, 4 (3), 279-282##### Use of the Mann–Whitney U-test for clustered data

B. Rosner and D. Grove

*Statistics in Medicine*.**1999**, 18 (11), 1387-1400https://doi.org/10.1002/(SICI)1097-0258(19990615)18:11<1387::AID-SIM126>3.0.CO;2-V

##### Handbook of clinical psychology

J. Cohen and B. B. Wolman

*McGraw-Hill New York*.**1965**, , 95-121