Metagenomic data possess specific features that call for adapted statistical methods. Learn about the most commonly used statistical tools in this field. The workshop first reviews the mechanism of statistical hypothesis testing. Then focus is put on the understanding the data structure and the specific issues in omics data. Classical ANOVA and its Alternatives in genomics are then covered. Finally, advanced methods and data visualisation tools are discussed.
Course Outline
Session 1: Statistical Hypothesis Testing
- Accounting for sampling variation in the decision-making process
- Mechanism underlying hypothesis testing
-
- Null and alternative hypotheses
- One- and two-sided tests
- Test statistic
- Decision rule: critical value and p-value
- Statistical vs. practical significance, effect size to detect
- Risk involved in hypothesis testing: confidence level and power
- Controlling risks
- General idea underlying sample size and power determination
Session 2: Understanding the Data Structure, Exploratory Data Analysis and Specific Issues in Metagenomic Data
- Specific features of metagenomics data
- Specific issues
- Differences with classical data
- Summarising data efficiently
- Multiplicity in statistical tests
- What is multiplicity?
- Identifying situations leading to multiplicity
- Dealing with multiplicity – Bonferroni, Tukey, Benjamini-Hochberg, etc.
Session 3: Classical ANOVA and its Alternatives in Genomics
- Refresher on basic principles underlying ANOVA
- Interpretation of an ANOVA table
-
- Significance of factors and interactions
- Interpretation of factor effects and interactions
- Principle underlying MANOVA
- Specifics of metagenomic data and alternative methods
- Principle of ANOSIM and PERMANOVA
-
- Comparison of centroids
- Heterogeneity of dispersion across groups
- Multiple comparisons
- Results visualisation
Session 4: Advanced Methods and Data Visualisation
- Various distance measures in metagenomics
- Principle & selection of an appropriate metric
- An Overview of Visualisation Tools – Ordination Methods
-
- Principal Component Analysis (PCA)
- Correspondence Analysis (CA)
- MultiDimensional Scaling (MDS)
- Principal Coordinates Analysis (PCoA)
Course Duration
The recommended course duration is 4 online sessions.
Target Audience
This course is aimed at engineers, researchers, bioinformaticians, biologists and depending on the target audience, the course is adapted. On the one hand by using examples specific to the field of application and on the other hand by presenting specific tools or applications, if applicable.
We just finished a series of training sessions on the Statistical Analysis of Metagenomics data, and I was impressed by Natalie’s delivery of the content. Despite its complexity, Natalie was able to explain it clearly to us, providing many well-made examples. I highly recommend her for your statistical training needs. We are eagerly looking forward to the next course!
The training we attended on biostatistical analysis of metagenomic data was greatly appreciated by all attendees. We demystified which type of statistical models to apply and how to interpret the data. Our instructor Natalie has a remarkable ability to communicate complex concepts in this field of application. I recommend her without any hesitation.