CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Statistical analysis of metagenomic data

Viktor Jonsson (Institutionen för matematiska vetenskaper)
Göteborg : University of Gothenburg, 2014.
[Licentiatavhandling]

Metagenomics is the study of microbial communities on the genome level by direct sequencing of environmental and clinical samples. Recently developed DNA sequencing technologies have made metagenomics widely applicable and the field is growing rapidly. The statistical analysis is however challenging due to the high variability present in the data which stems from the underlying biological diversity and complexity of microbial communities. Metagenomic data is also high-dimensional and the number of replicates is typically few. Many standard methods are therefore unsuitable and there is a need for developing new statistical procedures.

This thesis contains two papers. In the first paper we perform an evaluation of statistical methods for comparative metagenomics. The ability to detect differentially abundant genes and control error rates is evaluated for eleven methods previously used in metagenomics. Resampled data from a large metagenomic data set is used to provide an unbiased basis for comparisons between methods. The number of replicates, the effect size and the gene abundance are all shown to have a large impact on the performance. The statistical characteristics of the evaluated methods can serve as a guide for the statistical analysis in future metagenomic studies. The second paper describes a new statistical method for the analysis of metagenomic data. The underlying model is formulated within the framework of a hierarchical Bayesian generalized linear model. A joint prior is placed on the variance parameters and shared between all genes. We evaluate the model and show that it improves the ability to detect differentially abundant genes.

This thesis underlines the importance of sound statistical analysis when the data is noisy and high-dimensional. It also demonstrates the potential of statistical modeling within metagenomics.

Nyckelord: Metagenomics, Statistical methods, Hierarchical Bayesian models, Statistical power, False discovery rate, Environmental genomics, Generalized linear models, Count data


ISSN 1652-9715



Denna post skapades 2014-11-24. Senast ändrad 2014-12-05.
CPL Pubid: 206369

 

Läs direkt!

Lokal fulltext (fritt tillgänglig)


Institutioner (Chalmers)

Institutionen för matematiska vetenskaperInstitutionen för matematiska vetenskaper (GU)

Ämnesområden

Statistik
Matematisk statistik
Bioinformatik och systembiologi

Chalmers infrastruktur

Examination

Datum: 2014-12-17
Tid: 13:15
Lokal: Pascal, Matematiska vetenskaper, Chalmers tvärgata 3
Opponent: Doktor Ingrid Lönnstedt, Statistikon AB / Walter and Elize Hall Institute, Melbourne, Australia

Ingår i serie

Preprint / Department of Mathematical Sciences, Chalmers University of Technology and Göteborg University 2014:22