Pattern analysis, dimensionality reduction and hypothesis testing in high-dimensional data from animal studies with small sample sizes
Date issued
Authors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
License
Abstract
Experimental animal studies are typically associated with small sample sizes due to
ethical and practical limitations. However, such research projects often generate
high-dimensional data sets where the number of response variables is much greater than
the number of observations. This leads to several challenges with respect to the choice
of an appropriate statistical method.
The current research project focused on exploratory and inferential analysis of
multidimensional data sets from animal experiments with small group sizes. A
systematic comparison of univariate and multivariate hypothesis testing methods using
Monte Carlo simulations revealed that multivariate techniques offer no real benefit in
terms of power compared to univariate statistics. The well-known dimensionality
reduction technique, principal component analysis (PCA) was demonstrated to capture
dominant patterns in transcriptomic data successfully. However, PCA was outperformed
by ordination methods which take group assignment into account in terms of sensitivity
to detect treatment effects using simulated data. In contrast, multicollinearity combined
with small sample sizes was associated with high false positive rate when not handled
correctly by the multivariate statistical method. Additionally, microbiome studies based
on amplicon sequencing of the 16S rRNA gene were presented as a special case requiring
more flexible ordination and hypothesis testing techniques.
Taken together, this thesis demonstrates that harnessing the full potential of
multidimensional data is a challenging task which requires applying appropriate
statistical methods. A profound understanding of the strengths and limitations of the
alternative strategies is necessary in order to model the complex nature of multivariate
data and in turn draw correct inferences.