BiomeHorizon is an R package for visualizing longitudinal microbiome data in the form of a horizon plot. A horizon plot provides a compact way to visualize multiple time series in parallel by overlying the values at different ranges of magnitude. Though this package is designed for microbiome data, it can be used to visualize other types of longitudinal data as well.

In this tutorial, we will create a basic horizon plot from sample data, and then add modifications to demonstrate the versatility of the package. We provide two sample data sets; one is a human dietary supplement study (Johnson et al. 2019; data downloaded from https://github.com/knights-lab/dietstudy_analyses/tree/master/data/maps). The diet data set includes fecal shotgun metagenomes from 34 human subjects collected daily over 17 days. The second study uses opportunistically collected fecal amplicons (16S) from a wild baboon population (Grieneisen et al. accepted; data downloaded from doi:10.5281/zenodo.4662081). The baboon data set includes a subset of 276 samples from 6 baboon subjects (27 - 69 samples per subject) collected over 2-11 years per subject. Each data set contains an OTU (baboon) or microbial taxon ID (diet) table, a corresponding table of microbial taxonomy information, and a metadata table with collection dates.

What is a horizon plot?

A horizon plot is an effective method of visualizing change in values over time, in a fraction of the vertical space used by a line or area graph. The figure below illustrates step-by-step how a horizon plot is constructed from an area graph.

  1. Values are plotted as a relative abundance vs. time area graph for each OTU time series.
  2. Values are centered to a ‘zero’, in this case the median relative abundance. This centered value is referred to as the ‘horizon’ or ‘origin’.
  3. The plotting area is divided into quartile ‘bands’ above and below the origin, with darker blue bands indicating values incrementally above the origin and darker red bands below the origin; negative bands are mirrored upward.
  4. Bands are overlaid to compress vertical space.

This increased data density enables easier comparison between time series, as we will see below.

The graphs above depict the same diet sample data provided earlier. Note that the usage of a unique scale for each subplot captures the proportional decrease in abundance of taxon 36 and taxon 38 around timepoint 2 which follows the trend of other microbes, but is dwarfed by more highly abundant microbes in the line graph.

In the rest of the tutorial, we will learn how to use the package to make a horizon plot.

Loading in the package

The sample data sets and dependencies are loaded automatically with the package.

## Install devtools
install.packages("devtools")

devtools::install_github("blekhmanlab/biomehorizon")
library(biomehorizon)  

Preview of Diet Sample Data

## OTU table format. The first column contains microbial taxon IDs (or OTUs for 16S data), and
## all other columns are samples. Values represent sample reads per microbe within a given sample.
## Though in this case values are integer sample reads, they can also be represented as
## proportions or percentages of the total sample. Columns do not need defined names.
library(dplyr)

otusample_diet %>%
	arrange(desc(MCT.f.0002)) %>%
	select(1:7) %>%
	head()
  taxon_id  MCT.f.0002  MCT.f.0003  MCT.f.0004  MCT.f.0005  MCT.f.0006  MCT.f.0007
1	taxon 1	  220949	    25921	      112186	    48113	      52260	      78360
2	taxon 2	  63019	      4537	      31306	      16093	      14077	      24294
3	taxon 4	  51407	      436	        20574	      10770	      11810	      22105
4	taxon 6	  50088	      5124	      25748	      12910	      11716	      12623
5	taxon 30  42797	      2867	      21296	      12927	      9489	      16386
6	taxon 3	  37879	      2868	      16377	      7226	      7554	      12240

## Metadata format. Must include sample IDs that match the column names of otusample,
## subject IDs, and collection dates in either date or numeric format. 
## The columns with sample IDs, collection dates and subject names must be named 
## "sample", "collection_date" and "subject" respectively. 
head(metadatasample_diet)
   subject      sample collection_date supplement
1	 MCTs01	 MCT.f.0002	              2	      EVOO
2	 MCTs01	 MCT.f.0003	              3	      EVOO
3	 MCTs01	 MCT.f.0004	              4	      EVOO
4	 MCTs01	 MCT.f.0005	              5	      EVOO
5	 MCTs01	 MCT.f.0006	              6	      EVOO
6	 MCTs01	 MCT.f.0007	              7	      EVOO
## Taxonomydata format. Describes taxonomy of each microbe (or OTU, for 16S data) from Kingdom
## through Genus.
## Levels without classification have NA values.
## You can supply a vector of strings each with the entire taxonomy of a microbe,
## with levels separated by semicolons, or a table with columns for each taxonomic level where the first
## column is the OTU ID. Columns do not need defined names.
## Supports classification up to Subspecies (8 levels)
head(taxonomysample_diet)
  taxon_id                                                                                                                     taxonomy
1  taxon 1                                                  Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides
2  taxon 2                            Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides;Bacteroides uniformis
3  taxon 3                                                                             Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales
4  taxon 4 Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Rikenellaceae;Alistipes;Alistipes putredinis;Alistipes putredinis DSM 17216
5  taxon 5       Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Prevotellaceae;Prevotella;Prevotella copri;Prevotella copri DSM 18205
6  taxon 6                                Bacteria;Bacteroidetes;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides;Bacteroides dorei

Data Refining and OTU Selection

Before we visualize the data using the main function, we must first prepare the data sets and other variables for analysis using the prepanel function. This will do several things, most importantly:

  1. Filter OTUs
  2. Filter samples to just one subject
  3. Convert values to percentages, if they are not already in that format
  4. Check format of data sets
  5. Check for and catch common user errors

Then, it will output the refined arguments in a list that can be supplied to the main function.

You can use this function with just an OTU table, but this will assume that all samples come from the same subject. Since otusample_diet has 34 subjects, we will need to provide additional metadata matching samples to their subjects. Our metadata also contains collection time points for each sample, which will allow us to ensure samples are ordered chronologically.

Let’s select the subject “MCTs01”. How many samples are from MCTs01?

length(metadatasample_diet$subject[metadatasample_diet$subject == "MCTs01"])

paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01")
[1] 15

Constructed an OTU table and other variables with the following settings:
thresh_prevalence: 80
thresh_abundance: 0.5
thresh_NA: 5
subj: MCTs01

35 OTUs met the filtering requirements, with the following stats:
     OTU_ID Average_abundance Prevalence Num_missing_samples
1   taxon 1        23.4808510        100                   0
2   taxon 2         7.0220155        100                   0
3   taxon 3         3.6456202        100                   0
4   taxon 4         5.6399479        100                   0
5   taxon 6         3.4120092        100                   0
6   taxon 8         1.7292719        100                   0
7   taxon 9         2.9895162        100                   0
8  taxon 10         1.9505486        100                   0
9  taxon 11         1.3130744        100                   0
10 taxon 12         0.8252183        100                   0
11 taxon 13         2.8683227        100                   0
12 taxon 14         4.4062186        100                   0
13 taxon 15         1.3139743        100                   0
14 taxon 17         0.8876090        100                   0
15 taxon 20         1.4390210        100                   0
16 taxon 21         1.0271952        100                   0
17 taxon 23         1.6986533        100                   0
18 taxon 29         1.0616267        100                   0
19 taxon 30         4.7897075        100                   0
20 taxon 31         1.1059012        100                   0
21 taxon 36         1.0114477        100                   0
22 taxon 38         0.6557752        100                   0
23 taxon 39         0.8448858        100                   0
24 taxon 44         0.7525526        100                   0
25 taxon 45         0.6716049        100                   0
26 taxon 49         0.8346752        100                   0
27 taxon 51         0.6398042        100                   0
28 taxon 53         0.5779908        100                   0
29 taxon 59         1.9856799        100                   0
30 taxon 60         0.5668495        100                   0
31 taxon 63         3.0629905        100                   0
32 taxon 67         0.9129272        100                   0
33 taxon 71         0.5442007        100                   0
34 taxon 82         0.5669060        100                   0
35 taxon 85         0.5926795        100                   0
`biomehorizonpkg_otu_stats` was outputted to the environment.

Great. Now you should see in the console, the function selected several OTUs considered the most “important”. By default, this is done using four filtering standards.

  • thresh_prevalence: How many samples in this OTU contain at least one sample read that maps to this OTU?
  • thresh_abundance: Out of the samples that contain reads for this OTU, what was the average abundance of this OTU? i.e., what percentage of total reads in the sample are from this OTU?
  • thresh_abundance_override: The same measurement as thresh_abundance, but if this higher threshold is reached, it overrides all other standards and the OTU is included.
  • thresh_NA: This sets the maximum allowed percentage of samples with missing data (NA) for this OTU.

As an example, taxon 2 appears in 15/15 samples, giving it a “prevalence score” of 100%. Out of the 15 samples with at least one read, the average proportion of total reads is 0.0702, giving it an “abundance score” of 7.02%. This meets the default standards of 80% prevalence and 0.5% abundance, so the microbe is included.

We can view the sample reads that were aggregated to produce these prevalence and abundance scores for taxon 2 below. Note that since there are no zero values, prevalence is 100%, and thus abundance is equal to the mean of all values.

library(dplyr)
## Retrieve samples from MCTs01
otusample_subj1 <- otusample_diet %>%
	select(taxon_id, as.character((metadatasample_diet %>% filter(subject=="MCTs01"))$sample))

## Samples reads for taxon 2
otusample_subj1 %>%
	filter(taxon_id == "taxon 2") %>%
	select(-taxon_id) %>%
	as.numeric()
[1] 63019  4537 31306 16093 14077 24294 31317 35236 18594 52592 16422 39671 13605 32534  3230

These 35 OTUs were selected using the default filtering thresholds, but maybe we want stricter standards.

paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01",
thresh_prevalence = 90, thresh_abundance = 1.5)
Constructed an OTU table and other variables with the following settings:
thresh_prevalence: 90
thresh_abundance: 1.5
thresh_NA: 5
subj: MCTs01

14 OTUs met the filtering requirements, with the following stats:
     OTU_ID Average_abundance Prevalence Num_missing_samples
1   taxon 1         23.480851        100                   0
2   taxon 2          7.022015        100                   0
3   taxon 3          3.645620        100                   0
4   taxon 4          5.639948        100                   0
5   taxon 6          3.412009        100                   0
6   taxon 8          1.729272        100                   0
7   taxon 9          2.989516        100                   0
8  taxon 10          1.950549        100                   0
9  taxon 13          2.868323        100                   0
10 taxon 14          4.406219        100                   0
11 taxon 23          1.698653        100                   0
12 taxon 30          4.789708        100                   0
13 taxon 59          1.985680        100                   0
14 taxon 63          3.062990        100                   0
`biomehorizonpkg_otu_stats` was outputted to the environment.

Alternatively, we can manually select OTUs.

paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01",
otulist = c("taxon 1", "taxon 2", "taxon 10", "taxon 14"))

Constructing the Horizon Plot

After refining the data with prepanel, we supply the parameter list to horizonplot to construct the horizon plot.

## Basic plot using default filtering thresholds
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01")

horizonplot(paramList)

## Select microbes manually
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01",
otulist = c("taxon 1", "taxon 2", "taxon 10", "taxon 14"))

horizonplot(paramList)

Note that in the plot with manual selection, microbes are arranged according to their order in otulist. If you want to arrange microbe panels in a specific order, you should use this vector.

NOTE: If you are experiencing issues running the horizonplot() command with your own datasets, make sure you are importing your data correctly. Ideally, you should import data as a csv using the read.csv command (e.g., otudata <- read.csv('</path/to/otutable>/<filename>.csv')).

Plot a Single Microbe Across Multiple Subjects

Rather than plotting multiple microbes in one subject, we can also plot one microbe to compare the same timepoint across multiple subjects. To use this setting, however, subjects must have the same number of samples collected on the same days. We will subset the diet data set to 6 subjects who were sampled all 17 days of the study.

## Subset the data set to the subjects who were sampled on all 17 days, and arrange by date
metadata_17 <- metadatasample_diet %>%
  filter(subject %in% c("MCTs08","MCTs18","MCTs23","MCTs26","MCTs33","MCTs36")) %>%
  arrange(subject, collection_date)

otu_17 <- otusample_diet %>%
  select(taxon_id, as.character((metadatasample_diet %>% filter(subject %in% 
  c("MCTs08","MCTs18","MCTs23","MCTs26","MCTs33","MCTs36")))$sample))

## Single variable analysis with "Taxon 1"
paramList <- prepanel(otudata = otu_17, metadata = metadata_17, singleVarOTU = "taxon 1", subj = 		
	c("MCTs08","MCTs18","MCTs23","MCTs26","MCTs36","MCTs33"))

horizonplot(paramList)

Based on this graph we might then infer that subjects MCTs18, MCTs23, and MCTs26 all have a decreased abundance of Taxon 1 at sampling timepoint 12. Notice that subject facets are arranged according to their order in the vector supplied to subj.

Adjust bands to highlight common or rare OTUs

To emphasize different properties of the data, the bands on a horizon plot can be adjusted. There are three metrics which define the range of values on the horizon plot: origin, band thickness, and nbands.

Default values

The base of the first positive band for an OTU, where the Y-axis value=0, is the origin. The Y-scale height of each band is the band thickness. By default, the origin for each OTU is calculated as the median value of that OTU across all samples, and band widths represent 4 quartiles above (blue bands) and 4 quartiles below (red bands) the origin relative to the absolute extreme value for that OTU.

In other words, if OTU A has relative abundance values ranging from 0% to 30% with a median of 10%, then each band represents an abundance range of (30-10)/4 = 5%. Thus, a band colorscale value of +2 for OTU A at timepoint 1 indicates that OTU A had a relative abundance between (min = 10 + 5, max = 10 + 2*5) = 15-20% at timepoint 1, while a band colorscale value of -2 for OTU A at timepoint 2 indicates that OTU A had a relative abundance ranging from 0-5% at timepoint 2. Because the distance between the maximum and the origin (30-10% = 20%) is greater than the distance between the minimum and the origin (10-0% = 10%) for OTU A, there are no timepoints with a band colorscale of -3 or -4.

By scaling within each OTU, the dynamics of multiple OTUs that may vary in median abundance by orders of magnitude can be visualized on the same graph.

However, we can add several modifications to the horizon plot to emphasize different aspects of our longitudinal data.

Custom values

First, we change the number of positive bands into which data is segmented.

paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01", nbands = 3)

horizonplot(paramList)

Including more horizon bands will more precisely distinguish values and emphasize those at the highest ranges of magnitude. Using fewer bands will de-emphasize values at the extreme ends of the data.

We can also change the origin, the value at which all sample values will be centered. We can supply this either as a constant, to set a fixed origin value for all OTUs, or as a function that operates on sample values, to evaluate a unique origin for each panel.

## Origin as the mean absolute deviation of sample values
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01", origin = function(y) { mad(y, na.rm = TRUE) })

horizonplot(paramList)

## Set a fixed origin of 1% for all OTU subpanels
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01", origin = 1)
horizonplot(paramList)

Similarly, we can modify the band thickness, the height of each horizontal band denoted by a unique color, which determines the scale of a horizon subplot.

## Set band thickness to 1/6 the distance between the origin and maximum value
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01", band.thickness = function(y) {max((abs(y - origin(y))), na.rm=TRUE) / 6})

horizonplot(paramList)

Here, because the top of the highest band is only 4/6 of the maximum value, this becomes the new maximum and all higher values are rounded down. The same is true for negative bands. This can be valuable for de-emphasizing outliers at the extrema of the data.

## Fixed band thickness of 10%
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01", band.thickness = 10)

horizonplot(paramList)

## Fixed band thickness of 2%
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01", band.thickness = 2)

horizonplot(paramList)

## Fixed band thickness of 0.2%
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01", band.thickness = 0.2)

horizonplot(paramList)

Notice that at smaller values of band.thickness, an increasing number of values are above the new maximum or below the new minimum, resulting in more extreme bands (at +4 or -4). This accentuates changes in microbes with low abundances but compresses change in microbes with larger abundances, making small increases ambiguous from large increases.

## Fixed origin AND fixed band thickness
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs01", origin = 1, band.thickness = 1)
horizonplot(paramList)

Setting a fixed origin and band thickness lets us compare values between facets. For example, in nearly all samples, taxon 2 is more abundant than taxon 17. We can’t say this about a plot with a variable origin, as values are not centered to the same zero. Similarly, a variable band thickness means the distance of a positive value from the origin is not consistent between subplots.

Dealing with Missing and/or Irregularly Spaced Data

Missing data points are common in microbiome data sets. To deal with the occasional missing data point, the package offers tools to transform the data into a regularly spaced time series. To do this, we specify an interval of time at which to interpolate new data. Because the diet study was conducted daily over 17 days, we specify an interval of 1 day. As an example, we use subject MCTs16, who was sampled on 14 / 17 days.

## Adjust data to a regular time interval of 1 day
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, subj = "MCTs16", regularInterval = 1)

We can see the sample collection days starting from day 1 by viewing the timestamps variable from the output list of prepanel. This is the third element of the list (you can view components of the list in the horizonplot function documentation).

paramList[[3]]
  [1]  1  3  4  5  6  8  9 10 11 12 14 15 16 17

Each new value will be linear interpolated from the previous and subsequent timepoints. We can then plot the interpolated points with the observed data points.

horizonplot(paramList)

Note that because data have been interpolated to regular intervals, the x-axis label has changed to ‘Day’ rather than ‘Sample’.

Interpolation may not be appropriate for data sets with large gaps between collection days. For irregularly spaced microbiome data, such as those collected as part of observational sampling in wild animal microbiome studies or collected during certain times of the year over multiple years (e.g. summer field seasons), regularizing can introduce inaccuracy by interpolating across large timespans. Further, plotting data without accounting for temporal differences in sampling can potentially create misleading plots. For example, subject Baboon_388 had 59 samples collected over 3 1/2 years (1316 days).

## Plot samples from subject Baboon_388
paramList <- prepanel(otudata = otusample_baboon, metadata = metadatasample_baboon, subj = "Baboon_388")

horizonplot(paramList)

Because otusample_baboon is irregularly spaced, i.e. the distance of time between samples is not consistent throughout the time series, the timescale on the plot is misleading, even if we regularize the data by interpolating between points. We can reduce this inaccuracy by specifying the maximum amount of time without samples allowed (i.e. a gap) to create an interpolated timepoint. For example, if we interpolate a point every 50 days, but set the maximum gap between samples at 100 days, two samples that are >100 days apart will not have an interpolated timepoint. Instead, it will produce a break in the time axis, and data will be regularized separately on both sides of the break. This break is simulated by splitting the plot into two facets.

## Set maxGap to 75
paramList <- prepanel(otudata = otusample_baboon, metadata = metadatasample_baboon, subj = "Baboon_388", regularInterval = 25, maxGap = 75)

# Append custom set of axis ticks to avoid overlapping labels
horizonplot(paramList) +
	ggplot2::scale_x_continuous(expand = c(0,0),
	breaks = seq(from = 0, by = 200, to = 1200))

We can see the sample collection days starting from day 1.

paramList[[3]]
 [1]    1  157  162  172  183  313  415  424  427  430  432  436  443  455  477  479  485
[18]  492  508  511  514  519  569  635  655  691  697  760  773  821  847  877  988 1047
[35] 1068 1071 1071 1074 1076 1087 1093 1096 1096 1099 1102 1107 1118 1121 1121 1136 1150
[52] 1157 1178 1202 1214 1219 1235 1274 1316
horizonplot(paramList)

If many breaks in the time axis are created, this could result in facets with very few samples. For example, if maxGap = 150 and the first two samples are at days 1 and 157, the first facet would contain only one sample (day 1). You can set the minimum number of samples required to include a facet. When a breaks result in facets with fewer than the specified minimum, those facets are not shown. The default = 2 samples, so this graph does not start at day 1.

## Remove facets with <5 samples
paramList <- prepanel(otudata = otusample_baboon, metadata = metadatasample_baboon, subj = "Baboon_388", regularInterval = 25, maxGap = 75, minSamplesPerFacet = 5)

horizonplot(paramList) +
	ggplot2::scale_x_continuous(expand = c(0,0),
	breaks = seq(from = 0, by = 100, to = 1300))

Customizing Plot Aesthetics

In addition to the default OTU ID labels for each microbe subplot, we can also label facets by their taxonomy. This method will display the most narrow level of classification available for each microbe as a facet label. To do this, we need to supply a third data set with taxonomy information, taxonomysample.

## Supply taxonomysample and set facetLabelsByTaxonomy to TRUE
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, taxonomydata = taxonomysample_diet$taxonomy, subj = "MCTs01", facetLabelsByTaxonomy = TRUE)

horizonplot(paramList)

Alternatively, we can supply custom facet labels. These will be applied to facet subplots from top to bottom, in the order they are specified to the vector.

## Apply custom alphabetical facet labels
paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, taxonomydata = taxonomysample_diet, subj = "MCTs01", otulist = c("taxon 1", "taxon 2", "taxon 10", "taxon 14"), customFacetLabels = LETTERS[1:4])

horizonplot(paramList)

We can further customize the horizon plot by supplementing a list of aesthetics to the horizonplot() function. We obtain this list by calling the horizonaes() function with the custom aesthetics to override default values. If no custom aesthetics are specified to horizonplot(), default aesthetics will be retrieved by calling horizonaes() with no arguments.

paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, taxonomydata = taxonomysample_diet, subj = "MCTs01")

## Add a title; override x-label, y-label, and legend title defaults; adjust legend position
horizonplot(paramList, aesthetics = horizonaes(title = "Microbiome Horizon Plot", xlabel = "Samples from Subject MCTs01", ylabel = "Taxa found in >80% of samples", legendTitle = "Quartiles Relative to Taxon Median", legendPosition	= "bottom"))

## Remove a default aesthetic
horizonplot(paramList, aesthetics = horizonaes(xlabel = NULL))

We can supply a new color scale for horizon bands as a vector of hexadecimal color codes ordered from the most negative to the most positive band. The length of the vector is equal to 2 * the number of positive bands as specified in prepanel.

## How many positive bands?
paramList[[14]]
[1] 4
## Supply custom color scale of length 8
library(RColorBrewer)
horizonplot(paramList, aesthetics = horizonaes(col.bands = brewer.pal(8, "PiYG")))

Most commonly relevant aesthetics are returned by horizonaes(), but if we want to add other aesthetics not included in this function, we can manually append them to the horizon plot object using ggplot. For example, we can label the x-axis by date.

## Label the x-axis by date

paramList <- prepanel(otudata = otusample_baboon, metadata = metadatasample_baboon, subj = "Baboon_388", regularInterval = 25, maxGap = 75)

dateVec = as.Date(c(500, 600, 700, 800, 1100, 1200, 1300), origin =min(subset(metadatasample_baboon, subject == "Baboon_388")$collection_date)-1)

horizonplot(paramList, aesthetics = horizonaes(col.bands = brewer.pal(8, "PiYG"))) +
	ggplot2::scale_x_continuous(expand = c(0,0),
	breaks = c(500, 600, 700, 800, 1100, 1200, 1300),
	labels = dateVec) +
  ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))

Reproducing Plots from the Paper

The following code can be used to reproduce all plots shown in the paper.

library(biomehorizon)
library(dplyr)
library(cowplot)

#### horizonplot component of Fig 1A ####

## Subset the data set to the subjects who were sampled on all 17 days, and arrange by date
metadata_17 <- metadatasample_diet %>%
  filter(subject %in% c("MCTs08","MCTs18","MCTs23","MCTs26","MCTs33","MCTs36")) %>%
  arrange(subject, collection_date)

otu_17 <- otusample_diet %>%
  select(taxon_id, as.character((metadatasample_diet %>% filter(subject %in% c("MCTs08","MCTs18","MCTs23","MCTs26","MCTs33","MCTs36")))$sample))

## Single variable analysis with "Taxon 1"
paramList <- prepanel(otudata = otu_17, metadata = metadata_17, singleVarOTU = "taxon 1")

png("plot_by_subject.png", width = 200, height = 125, units = 'mm', res = 300)

horizonplot(paramList)

dev.off()

#### Fig 1B ####

paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, taxonomydata = taxonomysample_diet$taxonomy, subj = "MCTs01", facetLabelsByTaxonomy = TRUE, thresh_abundance = 0.75)

png("plot_taxonomy_labels.png", width = 200, height = 150, units = 'mm', res = 300)

horizonplot(paramList)

dev.off()

#### Fig 1C ####

paramList <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, taxonomydata = taxonomysample_diet$taxonomy, subj = "MCTs01", facetLabelsByTaxonomy = TRUE, origin = 1, band.thickness = 10, thresh_abundance = 0.75)

png("plot_origin1_bandthick10.png", width = 200, height = 150, units = 'mm', res = 300)

horizonplot(paramList)

dev.off()

#### Fig 1D ####

library(RColorBrewer)
paramList <- prepanel(otudata = otusample_baboon, metadata = metadatasample_baboon, subj = "Baboon_388", regularInterval = 25, maxGap = 75)

dateVec = as.Date(c(500, 600, 700, 800, 1100, 1200, 1300), origin =min(subset(metadatasample_baboon, subject == "Baboon_388")$collection_date)-1)

png("plot_customaes.png", width = 400, height = 150, units = 'mm', res = 300)

horizonplot(paramList, aesthetics = horizonaes(col.bands = brewer.pal(8, "PiYG"), title = "Microbiome Horizon Plot for Subject Baboon_388", xlabel = "Collection date", ylabel = "Taxa found in >80% of samples", legendTitle = "Quartiles Relative to Taxon Median", legendPosition	= "bottom")) +
  ggplot2::scale_x_continuous(expand = c(0,0),
                              breaks = c(500, 600, 700, 800, 1100, 1200, 1300),
                              labels = dateVec) +
  ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))

dev.off()