In a fair part of practical research designs including repeated sampling of the same individual occur. This is statistically a powerfull way to do research, as the effect of some pertubation (diet, medication, etc) is contrasted in the same individual. Usually, a measure at baseline is available, and then one or several after an intervention. Sometimes cross over designs is used, such that each individual is undergoing all treatments in the experiment.

Modelling these data is in statistics refered to as mixed models. Mixed, due to the model having several terms modelling the random variation. For single univariate models, the nlme or lme4 package has the functionallity. For multivariate analysis such as for the microbiome, there are dedicated methods which, as for the metagenomeSeq and DESeq2 methods, has parameter shrinkage to robustify the results. Here we use the dream framework (from Bioconductor).

Example data

The sausage intervention data is an example of repeated measures with baseline and after intervention data.

library(tidyverse)
library(phyloseq) 
library(variancePartition)
library(edgeR)
library(BiocParallel)

load('./data/Rats_inulin.RData')
phyX

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 480 taxa and 59 samples ]
## sample_data() Sample Data:       [ 59 samples by 15 sample variables ]
## tax_table()   Taxonomy Table:    [ 480 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 480 tips and 477 internal nodes ]

metadata <- phyX %>% sample_data()
table(metadata$time,metadata$ID)

##        
##         F1 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F2 F20 F21 F22 F23 F24 F25
##   Slut   0   1   1   1   1   1   1   1   1   1   1  1   1   1   1   1   1   1
##   Start  1   1   1   1   1   1   1   1   1   1   1  1   1   1   1   1   1   1
##        
##         F26 F27 F28 F29 F3 F30 F4 F5 F6 F7 F8 F9
##   Slut    1   1   1   1  1   1  1  1  1  1  1  1
##   Start   1   1   1   1  1   1  1  1  1  1  1  1

table(paste(metadata$Description,metadata$ID),metadata$time)

##                         
##                          Slut Start
##   Chow F1                   0     1
##   Chow F13                  1     1
##   Chow F15                  1     1
##   Chow F20                  1     1
##   Chow F21                  1     1
##   Chow F22                  1     1
##   Frankfurter F10           1     1
##   Frankfurter F11           1     1
##   Frankfurter F12           1     1
##   Frankfurter F17           1     1
##   Frankfurter F2            1     1
##   Frankfurter F24           1     1
##   Frankfurter F25           1     1
##   Frankfurter F29           1     1
##   Frankfurter F3            1     1
##   Frankfurter F4            1     1
##   Frankfurter F6            1     1
##   Frankfurter F8            1     1
##   Frankfurter+inulin F14    1     1
##   Frankfurter+inulin F16    1     1
##   Frankfurter+inulin F18    1     1
##   Frankfurter+inulin F19    1     1
##   Frankfurter+inulin F23    1     1
##   Frankfurter+inulin F26    1     1
##   Frankfurter+inulin F27    1     1
##   Frankfurter+inulin F28    1     1
##   Frankfurter+inulin F30    1     1
##   Frankfurter+inulin F5     1     1
##   Frankfurter+inulin F7     1     1
##   Frankfurter+inulin F9     1     1

Here, the time==Start indicates pre-diet intervention, and hence here the rats should be similar regardless of diet.

The model for the responses (the abundance of the microbiome) is:

\[ y_i = a(time_i) + b(time_i,diet_i) +\kappa(ID_i) + e_i\] Here \(a\) and \(b\) describes the effect of time and time and diet in combination, while \(\kappa\) reflects the difference between individual Rats, and \(e\) reflects the residual uncertainty.

Setup the data

phyX <- phyX %>% subset_samples(ID!='F1') # remove the F1 rat with no slut data. 

otutb <- as.data.frame(phyX@otu_table )
metadata <- phyX %>% sample_data() %>% data.frame() %>% 
  mutate(Diet = gsub('[[:punct:]]','',Description)) # we need to remove the + from the Description coloumn. 

GM = DGEList(otutb)
GM = calcNormFactors( GM)

# Specify parallel processing parameters
# this is used implicitly by dream() to run in parallel
param = SnowParam(4, "SOCK", progressbar=TRUE)

Specify the model

# The variable to be tested must be a fixed effect
# form <- ~ time + time:Diet  + (1|ID)
form <- ~ 0 + time:Diet  + (1|ID)

Run the model

# estimate weights using linear mixed model of dream
vobjDream = voomWithDreamWeights( GM, form, metadata, BPPARAM=param )

# Fit the dream model on each gene
# By default, uses the Satterthwaite approximation for the hypothesis test
fitmm = dream( vobjDream, form, metadata )
fitmm = eBayes(fitmm)

Look at the results

topTable( fitmm) %>% data.frame() %>% rownames_to_column('asv')

##        asv timeSlut.DietChow timeStart.DietChow timeSlut.DietFrankfurter
## 1  OTU_372         13.825272           14.44325                16.350681
## 2    OTU_3         14.277123           14.86154                16.717883
## 3   OTU_14          9.730454           10.65863                14.601279
## 4   OTU_51         10.745896           11.46514                10.236039
## 5   OTU_19         14.408681           13.52770                 8.909757
## 6  OTU_431         13.114981           12.64798                 7.995959
## 7   OTU_73          9.833801           10.66760                 9.274168
## 8   OTU_11         14.922619           14.65879                 7.675523
## 9  OTU_306         13.636541           12.95483                12.045826
## 10   OTU_4         13.206092           15.21415                13.989912
##    timeStart.DietFrankfurter timeSlut.DietFrankfurterinulin
## 1                   14.20794                      15.115094
## 2                   14.68724                      15.501351
## 3                   10.07538                      13.750277
## 4                   11.29011                      10.225030
## 5                   14.36788                       5.167834
## 6                   13.43585                       5.947968
## 7                   10.41911                       9.392297
## 8                   14.38631                       4.287265
## 9                   12.76416                      12.901823
## 10                  15.91445                      13.369206
##    timeStart.DietFrankfurterinulin   AveExpr         F      P.Value
## 1                         14.39189 14.860674 1372.7772 1.869704e-62
## 2                         14.82856 15.280190 1313.5472 6.983454e-62
## 3                         10.26513 11.827566  998.9851 2.461828e-58
## 4                         11.27411 10.820879  961.2841 7.744316e-58
## 5                         14.31967 11.276058  891.6191 7.272915e-57
## 6                         13.37876 10.669767  841.9919 3.995825e-56
## 7                         10.40387  9.935601  838.5149 4.519067e-56
## 8                         14.92897 11.094185  821.5593 8.295366e-56
## 9                         12.89817 12.739143  821.3184 8.368001e-56
## 10                        15.48022 14.603814  816.5737 9.940814e-56
##       adj.P.Val    F.std
## 1  8.974579e-60 50.49855
## 2  1.676029e-59 50.05344
## 3  3.938925e-56 47.29331
## 4  9.293179e-56 46.90585
## 5  6.981998e-55 46.14848
## 6  3.098789e-54 45.57227
## 7  3.098789e-54 45.53065
## 8  4.462934e-54 45.32519
## 9  4.462934e-54 45.32224
## 10 4.771591e-54 45.26398

This is overall effects. Obviously, diet changes the microbiome. However, what we are interested in is the contrasts.

Contrasts

This is refered to as contrasts. I.e. compare the effect from Start to Slut for Frankfurter with the same for chow; A change versus a change or also known as Differences in Differences

This is specified using the coloumn names from the output above. Here the effect of Frankfurter is compare to the control diet (Chow).

In the specification of the model above, we include 0 to remove the intercept, and : to avoid main effects in the interactionmodel. That produce an output from the the 2 x 3 design as just the flat means for each of the 6 groups (more or less). Establishing a difference from start to end of trial is hence a contrast of two terms, while a difference of differences is made up by four terms.

L = makeContrastsDream( form, metadata, 
                        contrasts = c("timeSlut.DietFrankfurter - timeStart.DietFrankfurter - 
                                      timeSlut.DietChow + timeStart.DietChow", 
                                      
                                      "timeSlut.DietFrankfurterinulin - 
                                      timeStart.DietFrankfurterinulin - 
                                      timeSlut.DietChow + timeStart.DietChow"))

dimnames(L)[2]$Contrasts <-  c('C_FF','C_Inulin') # the labels are tooo long, so here shorter versions
plotContrasts(L)

0.1 Fit the model

fit = dream( vobjDream, form, metadata, L)
fit = eBayes(fit)

0.2 Extract results and plot it

res_Frankfurter <- topTable( fit, coef="C_FF", number=1000) # we just export all
res_Inulin <- topTable( fit, coef="C_Inulin", number=1000) # we just export all

TXtab <- phyX %>% tax_table() %>% data.frame() %>% # export TAxtable to glue onto results
  rownames_to_column('otu') 

res <- bind_rows(res_Frankfurter %>% mutate(contrast = 'Frankfurter vs Chow') %>% rownames_to_column('otu'), 
                 res_Inulin %>% mutate(contrast = 'FFinulin vs FF') %>% rownames_to_column('otu')) %>%
  left_join(TXtab, by = 'otu') # add Taxinfo


ggplot(data = res, aes(logFC, -log10(P.Value), label = otu, color = Rank2)) + 
  geom_point() + 
  geom_text(data = res[res$P.Value<0.000001,]) + 
  facet_wrap(~contrast)

… Lets plot one example OTU_6

data.frame(metadata,otusel = as.numeric(otutb['OTU_6',])) %>% 
  ggplot(data =., aes(time,otusel + 1)) + geom_boxplot() + 
  geom_line(aes(group = ID)) + 
  scale_y_log10() + 
  facet_wrap(~Diet)

1 Another Example

The study Wastyk, H. C., Fragiadakis, G. K., Perelman, D., Dahan, D., Merrill, B. D., Feiqiao, B. Y., … & Sonnenburg, J. L. (2021). Gut-microbiota-targeted diets modulate human immune status. Cell, 184(16), 4137-4153. has put all their data out on github

try downloading the phyloseq_obj_PilotStudy_log.rds from the data/16S folder and import it into R

phyX <- readRDS('../MicrobiomeDataAnalysis/_site/data/phyloseq_obj_PilotStudy_log.rds')
phyX

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 2204 taxa and 311 samples ]
## sample_data() Sample Data:       [ 311 samples by 15 sample variables ]
## tax_table()   Taxonomy Table:    [ 2204 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 2204 tips and 2202 internal nodes ]

… Take a look at the design

table(paste(phyX@sam_data$Participant,phyX@sam_data$Group), phyX@sam_data$Timepoint)

##                 
##                  1 2 3 4 5 6 7 8 9
##   8001 Fiber     1 1 1 1 1 1 1 1 1
##   8002 Fiber     1 1 1 1 1 1 1 1 1
##   8003 Fiber     1 1 1 1 1 1 1 0 1
##   8004 Fermented 1 1 1 1 1 1 1 1 1
##   8006 Fiber     1 1 1 1 1 1 1 1 1
##   8007 Fiber     1 1 1 1 1 1 1 1 1
##   8008 Fermented 1 1 1 1 1 1 1 1 1
##   8009 Fiber     1 1 1 1 1 1 1 1 1
##   8010 Fermented 1 1 1 1 1 1 1 0 0
##   8011 Fermented 1 1 1 1 1 1 1 0 1
##   8013 Fiber     1 1 1 1 1 1 1 1 1
##   8014 Fermented 1 1 1 1 1 1 1 1 1
##   8016 Fermented 1 1 1 1 1 1 1 1 0
##   8017 Fiber     1 1 1 1 1 1 1 1 1
##   8018 Fiber     1 1 1 1 1 1 1 1 0
##   8020 Fermented 1 1 1 1 1 1 1 1 1
##   8021 Fermented 1 1 1 1 1 1 1 1 1
##   8022 Fiber     1 1 0 1 1 1 1 1 1
##   8023 Fiber     1 1 1 1 1 1 1 1 1
##   8024 Fermented 1 1 1 1 1 1 0 1 1
##   8025 Fermented 1 1 1 1 1 1 1 1 1
##   8026 Fermented 1 1 1 1 0 1 1 1 1
##   8027 Fermented 1 1 1 1 1 1 1 1 1
##   8028 Fermented 1 1 1 1 1 1 1 1 1
##   8029 Fiber     1 1 1 1 1 1 1 1 1
##   8030 Fermented 1 1 1 1 1 1 0 1 1
##   8031 Fermented 1 1 1 1 1 1 1 1 1
##   8032 Fermented 1 1 1 0 0 1 1 1 1
##   8033 Fermented 1 1 1 1 1 1 1 1 1
##   8034 Fermented 1 1 1 1 1 1 1 1 1
##   8035 Fiber     1 1 1 1 1 1 0 1 1
##   8036 Fiber     1 1 1 1 1 1 1 1 1
##   8037 Fiber     1 1 1 1 1 1 1 1 1
##   8038 Fiber     1 1 1 1 1 1 1 1 1
##   8039 Fiber     1 1 1 1 1 1 1 1 1
##   8041 Fiber     1 1 1 1 1 1 1 1 1

It is a two armed longitudinal study with up to 9 timepoints.

1.1 Mixed model on Alpha-div

Before doing bug-per-bug analysis, the microbiome is condensed into one vector and used as response in an univariate mixed model. For the sake of the example, we use Faith phylogenetic diversity index as alpha diversity measure.

library(picante)
FaithPD = picante::pd(samp = otu_table(phyX), tree = phy_tree(phyX), include.root = F)$PD
# combine with meta data
samdata <- data.frame(phyX@sam_data, FaithPD)

# plot it connecting the individuals with lines
ggplot(data = samdata, aes(Timepoint,FaithPD, group = Participant, color =Group)) + geom_point() + geom_line()

library(lme4)
library(lmerTest)
m <- lmer(data = samdata, FaithPD~factor(Timepoint)*Group + (1|Participant))
anova(m)

## Type III Analysis of Variance Table with Satterthwaite's method
##                         Sum Sq Mean Sq NumDF  DenDF F value  Pr(>F)  
## factor(Timepoint)       393.69  49.211     8 259.16  1.7071 0.09692 .
## Group                     0.01   0.011     1  33.96  0.0004 0.98464  
## factor(Timepoint):Group 340.87  42.608     8 259.16  1.4780 0.16533  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In line with the plot, there is not much of differences between groups and neither over time.

summary(m)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: FaithPD ~ factor(Timepoint) * Group + (1 | Participant)
##    Data: samdata
## 
## REML criterion at convergence: 1960.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4518 -0.5428 -0.0720  0.4869  3.2960 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  Participant (Intercept) 48.04    6.931   
##  Residual                28.83    5.369   
## Number of obs: 311, groups:  Participant, 36
## 
## Fixed effects:
##                               Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)                    21.9456     2.0665  73.5945  10.620   <2e-16 ***
## factor(Timepoint)2             -1.1903     1.7897 258.9633  -0.665   0.5066    
## factor(Timepoint)3              0.8624     1.7897 258.9633   0.482   0.6303    
## factor(Timepoint)4              0.4622     1.8193 259.1284   0.254   0.7996    
## factor(Timepoint)5              2.8129     1.8515 259.2646   1.519   0.1299    
## factor(Timepoint)6              3.9653     1.7897 258.9633   2.216   0.0276 *  
## factor(Timepoint)7              3.5815     1.8512 259.2301   1.935   0.0541 .  
## factor(Timepoint)8              3.6992     1.8515 259.2640   1.998   0.0468 *  
## factor(Timepoint)9              4.7000     1.8515 259.2640   2.538   0.0117 *  
## GroupFiber                      1.0736     2.9225  73.5945   0.367   0.7144    
## factor(Timepoint)2:GroupFiber   3.6852     2.5311 258.9633   1.456   0.1466    
## factor(Timepoint)3:GroupFiber  -0.2041     2.5517 259.0286  -0.080   0.9363    
## factor(Timepoint)4:GroupFiber  -0.4851     2.5520 259.0472  -0.190   0.8494    
## factor(Timepoint)5:GroupFiber  -1.0513     2.5751 259.1191  -0.408   0.6834    
## factor(Timepoint)6:GroupFiber  -3.5729     2.5311 258.9633  -1.412   0.1593    
## factor(Timepoint)7:GroupFiber  -1.7441     2.5952 259.1621  -0.672   0.5022    
## factor(Timepoint)8:GroupFiber  -3.2237     2.5955 259.1795  -1.242   0.2153    
## factor(Timepoint)9:GroupFiber  -2.6493     2.5955 259.1795  -1.021   0.3083    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looking at the random effect, we see that the variance related to individual is larger than the residual variation, meaning that there indeed is a lot of week to week variation conserved in the microbiome.

Lets try DA on this dataset. Here, all code in one go.

phyXsel <- phyX %>% filter_taxa(function(x) sum(x > 0)>20, TRUE)

otutb <- as.data.frame(phyXsel@otu_table)
metadata <- phyXsel %>% sample_data() %>% data.frame() %>% 
  mutate(time = Timepoint %>% factor())

GM = otutb %>% t() %>% DGEList()
GM = calcNormFactors(GM)

# Specify parallel processing parameters
# this is used implicitly by dream() to run in parallel
param = SnowParam(4, "SOCK", progressbar=TRUE)


form <- ~ 0 + time:Group + (1|Participant)

# estimate weights using linear mixed model of dream
vobjDream = voomWithDreamWeights( GM, form, metadata, BPPARAM=param )

# Fit the dream model on each gene
# By default, uses the Satterthwaite approximation for the hypothesis test
fitmm = dream( vobjDream, form, metadata )
fitmm = eBayes(fitmm)
# topTable( fitmm) %>% data.frame() %>% rownames_to_column('asv')

L = makeContrastsDream( form, metadata, 
                        contrasts = c("time9.GroupFermented - time1.GroupFermented",
                                      "time9.GroupFiber - time1.GroupFiber",
                                      "time9.GroupFermented - time1.GroupFermented -
                                      time9.GroupFiber + time1.GroupFiber"))

dimnames(L)[2]$Contrasts <-  c('Eff_ferm','Eff_fiber','DDferm_fiber') # the labels are tooo long, so here shorter versions
# plotContrasts(L)

fit = dream( vobjDream, form, metadata, L)
fit = eBayes(fit)

Collect results and plot

res_ferm <- topTable( fit, coef="Eff_ferm" ,number=1000) # we just export all
res_fiber <- topTable( fit, coef="Eff_fiber" ,number=1000) 
res_dd <- topTable( fit, coef="DDferm_fiber" ,number=1000) 

TXtab <- phyXsel %>% tax_table() %>% data.frame() %>% # export TAxtable to glue onto results
  rownames_to_column('otu') 

res <- bind_rows(res_ferm %>% mutate(contrast = 'Fermented') %>% rownames_to_column('otu'), 
                 res_fiber %>% mutate(contrast = 'Fiber') %>% rownames_to_column('otu'),
                 res_dd %>% mutate(contrast = 'Ferm - Fiber') %>% rownames_to_column('otu')) %>%
  left_join(TXtab, by = 'otu') # add Taxinfo


res %>% 
  mutate(Family2 = Family %>% factor() %>% fct_lump_n(5)) %>% 
  ggplot(data = ., aes(logFC, -log10(P.Value), label = otu, color = Family2)) + 
  geom_point() + 
  facet_wrap(~contrast)

res %>% arrange(adj.P.Val) %>% 
  select(logFC,P.Value,adj.P.Val, B, contrast, Class:Species) %>% 
  head(10)

##         logFC      P.Value  adj.P.Val          B     contrast
## 1   1.2932479 0.0001037790 0.03860580  0.3005397 Ferm - Fiber
## 2   1.2681159 0.0001409278 0.04517349  0.8282160    Fermented
## 3   0.9087402 0.0002428682 0.04517349  0.3544897    Fermented
## 4   1.0778592 0.0023135990 0.28688628 -1.5618816    Fermented
## 5   0.4486806 0.0012295024 0.32268288 -1.3514304        Fiber
## 6   0.6832015 0.0018989680 0.32268288 -1.6626141        Fiber
## 7   1.1965432 0.0026022813 0.32268288 -1.7994697        Fiber
## 8   1.0237319 0.0040683468 0.37491912 -2.1836006        Fiber
## 9  -0.4963649 0.0067053782 0.37491912 -2.5013149        Fiber
## 10  0.8612708 0.0080586011 0.37491912 -2.6994468        Fiber
##                     Class             Order                Family
## 1           c__Clostridia  o__Clostridiales    f__Ruminococcaceae
## 2           c__Clostridia  o__Clostridiales    f__Ruminococcaceae
## 3           c__Clostridia  o__Clostridiales    f__Ruminococcaceae
## 4           c__Clostridia  o__Clostridiales    f__Lachnospiraceae
## 5           c__Clostridia  o__Clostridiales    f__Ruminococcaceae
## 6           c__Clostridia  o__Clostridiales                  <NA>
## 7           c__Clostridia  o__Clostridiales    f__Lachnospiraceae
## 8  c__Gammaproteobacteria o__Pasteurellales    f__Pasteurellaceae
## 9                  c__RF3      o__ML615J-28                   f__
## 10          c__Clostridia  o__Clostridiales f__[Mogibacteriaceae]
##                  Genus           Species
## 1  g__Faecalibacterium    s__prausnitzii
## 2      g__Ruminococcus               s__
## 3  g__Faecalibacterium    s__prausnitzii
## 4       g__Lachnospira               s__
## 5  g__Faecalibacterium    s__prausnitzii
## 6                 <NA>              <NA>
## 7         g__Roseburia               s__
## 8       g__Haemophilus s__parainfluenzae
## 9                  g__               s__
## 10                 g__               s__

Plot the top flyers

sel <- res %>% arrange(adj.P.Val) %>% head(3) # dig the strongest results out

# combine with meta data
xx <- otutb[,colnames(otutb) %in% sel$otu] %>% 
  data.frame() %>% 
  cbind(metadata)

# plot it
xx %>% 
  gather(otu,abu,-c(SampleID:time)) %>% 
  ggplot(data = ., aes(time,abu, fill = Group)) + 
  geom_boxplot() + scale_y_sqrt() + 
  facet_wrap(~otu)

Differential Abundance for Repeated measures