Please use this identifier to cite or link to this item: http://doi.org/10.25358/openscience-5150
Authors: Todorov, Hristo
Title: Pattern analysis, dimensionality reduction and hypothesis testing in high-dimensional data from animal studies with small sample sizes
Online publication date: 19-Oct-2020
Language: english
Abstract: Experimental animal studies are typically associated with small sample sizes due to ethical and practical limitations. However, such research projects often generate high-dimensional data sets where the number of response variables is much greater than the number of observations. This leads to several challenges with respect to the choice of an appropriate statistical method. The current research project focused on exploratory and inferential analysis of multidimensional data sets from animal experiments with small group sizes. A systematic comparison of univariate and multivariate hypothesis testing methods using Monte Carlo simulations revealed that multivariate techniques offer no real benefit in terms of power compared to univariate statistics. The well-known dimensionality reduction technique, principal component analysis (PCA) was demonstrated to capture dominant patterns in transcriptomic data successfully. However, PCA was outperformed by ordination methods which take group assignment into account in terms of sensitivity to detect treatment effects using simulated data. In contrast, multicollinearity combined with small sample sizes was associated with high false positive rate when not handled correctly by the multivariate statistical method. Additionally, microbiome studies based on amplicon sequencing of the 16S rRNA gene were presented as a special case requiring more flexible ordination and hypothesis testing techniques. Taken together, this thesis demonstrates that harnessing the full potential of multidimensional data is a challenging task which requires applying appropriate statistical methods. A profound understanding of the strengths and limitations of the alternative strategies is necessary in order to model the complex nature of multivariate data and in turn draw correct inferences.
DDC: 570 Biowissenschaften
570 Life sciences
Institution: Johannes Gutenberg-Universität Mainz
Department: FB 10 Biologie
Place: Mainz
DOI: http://doi.org/10.25358/openscience-5150
URN: urn:nbn:de:hebis:77-openscience-0977132b-0007-4cb0-8c73-6557862489795
Version: Original work
Publication type: Dissertation
License: CC BY
Information on rights of use: http://creativecommons.org/licenses/by/4.0/
Extent: 113 Seiten
Appears in collections:JGU-Publikationen

Files in This Item:
  File Description SizeFormat
Thumbnail
todorov_hristo-pattern_analys-20201009103226129.pdf15.91 MBAdobe PDFView/Open