Companion website for:

"Structure and function of the global ocean microbiome"

Sunagawa, Coelho, Chaffron, et al., Science, 2015

North Pacific OceanSouth Pacific OceanNorth Atlantic OceanSouth Atlantic OceanSouthern OceanIndian OceanMediterranean SeaRed Sea

Interactive map of 68 representative (out of 210) Tara Oceans sampling stations

Companion Website Tables

Tables W1-W8

Download: Companion Website Tables W1-W8

This spreadsheet file contains the following tables:

Table W1: Tara Oceans sample description

Table W2: Sequencing statistics

Table W3: Reference genomes

Table W4: Source and statistics of genes used to generate the OM-RGC

Table W5: Descriptive statistics for miTAG-based analyses

Table W6: List of functional marker genes

Table W7: List of clusters of orthologous groups

Table W8: Associated metadata used for analysis

Companion Website Data

Ocean Microbial Reference Catalog

Abundance profiling of samples originating from spatially distant locations using a taxonomically and functionally annotated reference gene catalog provides the framework for answering the simple, yet central question quantitative metegenomics attempts to address: "Who is doing what and where?". We assembled billions of DNA shotgun sequencing reads to reconstruct microbial community genomes. Genes were predicted on these "contigs" and together with genes from publicly available genomic and metagenomic data sets (see box on sidebar) they were clustered to generate a non-redundant set of reference genes. Public data sets include:

  • Global Ocean Sampling (GOS) expedition
  • Pacific Ocean Virome (POV) project
  • NCBI reference genomes (relevant to the marine environment)
  • Moore Microbial Genome Sequencing Project: phage/viral genomes (MPVG)

The complete catalog can be downloaded here:

OM-RGC as single compressed TSV file [ Size: 7.8 GB | MD5 sum: 74cb7a263ba785df918210b2ec4560fe ]

For additional information on the file-scheme, please refer to the following document:


To generate a FASTA file using Perl, execute:
zcat OM-RGC_seq.release.tsv.gz | perl -lane 'print ">$F[0]\n$F[-1]" unless $.==1' > OM-RGC_seq.release.fna

For more information on the taxonomic composition of the OM-RGC, we generated an interactive pie-chart.

Taxonomic profiles

For each prokatyote-enriched sample (N=139), we extracted metagenomic merged Illumina reads (miTAGs) that contained signatures of the 16S rRNA gene (Logares et al. 2013):

16SrRNA miTAG sequences [ Size: 509 MB | MD5 sum: 0f56e3caef6cfd501013db9c9b7d4e6f ]

These fragments were mapped to a set of 16S reference sequences that were downloaded from the SILVA database (Release 115: Quast et al. 2013) and clustered into 97% operational taxonomic units. The OTU count table was summarized at different taxonomic levels and can be downloaded here:

16S count tables:

annotated 16S OTU count table [ Size: 1.5 MB | MD5 sum: 6d3356650ce26f13ebb500b12c9c541f ]
SILVA 16S OTU reference sequences [ Size: 11 MB | MD5 sum: e544ef4c007510ce470860cdf2da1a7d ]

In addition, we identified and clustered universal, protein-coding, single-copy phylogenetic marker genes into metagenomic operational taxonomic units (mOTUs) that group organisms into species-level clusters [Mende et al. 2013, Sunagawa et al. 2013].

Relative mOTU abundance table can be downloaded here:

mOTU relative abundances [ Size: 968 KB | MD5 sum: ad00bb7f257925569ce1567c9c099ff0 ]

Functional profiles

After generating the reference gene catalog, reads from each sample were mapped to the catalog to estimate functional and taxonomical abundances.For each prokaryote-enriched sample (N=139), the abundance of each gene in the OM-RGC was determined using MOCAT (Kultima et al. 2012). Based on the functional annotations of the OM-RGC, these gene abundances were summarized at the level of: (i) eggNOG gene families (genes annoted to the eggNOG version 3 database: Powell et al. 2011), (ii) KEGG orthologous groups and (iii) KEGG modules. We provide profiles based on the full catalog and based on the subset of genes that were annotated as bacterial or archaeal.

Full profiles:

eggNOG profile [ Size: 49 MB | MD5 sum: 3aa5ff97943bffd8545ece90d23836d7 ]
KEGG orthologous group [ Size: 15 MB | MD5 sum: 8f6c68d231f06d184e06a571ad83f493 ]
KEGG modules [ Size: 900 KB | MD5 sum: 197229c41caa2131620f5abeaa0c6289 ]

Prokatyotic OG profile:

eggNOG OG profile (prokaryotic only) [ Size: 27 MB | MD5 sum: 266272c26c2f273f3caa8c2ed07a689a ]

Companion Website Information

Bioinformatic Methods

Metagenomic Assembly and Gene Prediction

Raw paired-end Illumina reads were trimmed form the 3’ and 5’ ends, and reads shorter than 45 bp were removed. To minimize artificial contamination, high quality reads matching any adapter sequence used in the sequencing step, were discarded. After removal of reads matching adapter sequences, the reads were assembled into scaftigs (contigs that were extended and linked using the paired-end information of sequencing reads). Only scaftigs longer than 500 bp were kept. After assembly, gene-coding sequences were predicted on the assembled scaftigs. The data can be downloaded from the EBI-ENA:

  • Tara Oceans shotgun sequencing and barcoding data - EBI-ENA: PRJEB402
  • Predicted genes - http://www.ebi.ac.uk/ena/data/view/ERZ096909-ERZ097151
  • Metagenomic assemblies - http://www.ebi.ac.uk/ena/about/tara-oceans-assemblies

    Summary statistics of initial data processing steps are listed below in an interactive table.

    Gene prediction on external data (GOS, POV, MPVG)

    For each of the three datasets, contigs shorter than 500bp were discarded prior to gene prediction. The number of predicted genes for each dataset was: GOS: 22,607,701 , POV: 1,777,775 and MPVG: 14,790 (see also Data sources of the OM-RGC).

    Clustering of predicted genes into the OM-RGC

    137,523,700 sequences form the datasets were clustered at 95% identity into a set of 40,154,822 non-redundant sequences (sequences shorter than 100bp were discarded) using Cd-hit (Li et al. 2006)
    $ cd-hit-para.pl

    -c 0.95 -T 16 -M 0 -G 0 -aS 0.9 -g 1 -r 1 -d 0 --P cd-hit-est --S 32 --Q 8 --T SGE

    Functional annotation of the OM-RGC

    The set of non-redundant genes were functionally annotated by blasting the protein sequences of the genes against eggNOG v3 and KEGG v62 using SMASH v1.6 (Arumugam et al. 2010).
    SMASH v1.6 Settings for EggNOG and KEGG blasts

    -e 0.01 -b 100 -W 3 -K 1 -F T -m 8
  • Tara Oceans Coordinators

    Silvia G. Acinas, Peer Bork, Emmanuel Boss, Chris Bowler, Colomban de Vargas, Michael Follows, Gabriel Gorsky, Nigel Grimsley, Pascal Hingamp, Daniele Iudicone, Olivier Jaillon, Stefanie Kandels-Lewis, Lee Karp-Boss, Eric Karsenti, Uros Krzic, Fabrice Not, Hiroyuki Ogata, Stephane Pesant, Jeroen Raes, Emmanuel G. Reynaud, Christian Sardet, Mike Sieracki, Sabrina Speich, Lars Stemmann, Matthew B. Sullivan, Shinichi Sunagawa, Didier Velayoudon, Jean Weissenbach, Patrick Wincker.

    Tara Oceans Contributors

    Ameer Abdulla, Chantal Abergel, Denis Allemand, Aldine Amiel, Leif Anderson, David Antoine, Detlev Arendt, Roberto Arrigoni , Defne Arslan, Francois Artiguenave, Stephane Audic, Jean-Marc Aury, Marcel Babin, Celine Bachelier, Xavier Bailly, Andrew Baker, Cecilia Balestra, Benedetto Barone, Daniela Basso, Daniel Bayley, Gregory Beaugrand, Laurent Beguery, Elia Benito-Guttierez, Francesca Benzoni, Eric Beraud, Lionel Bigot, Lucie Bittner, Martine Boccara, Roxane Boonstra, Peer Bork, Emmanuel Boss, Christophe Boutte, Chris Bowler, Annick Bricaud, Jennifer Brum, Jeremie Capoulade, Luigi Caputi, Annalisa Caragnano, Margaux Carmichael, Raffaela Casotti, Ivona Cetinic, Samuel Chaffron, Aurelie Chambouvet, Patrick Chang, Ali Chase, Claudia Chica, Herve Claustre, Jean-Michel Claverie, Camille Clerissi, Sebastien Colin, Montse Coll-Llado, Steeve Comeau, Christian Conrad, Laurent Coppola, Miguel Francisco Cornejo, Marcella Cornejo , Daniel Cossa, Maryam Cousin, Corinne Cruaud, Corrine Cuck, Marcela Ottone, Corinne Da Silva, Denis Dausse, Denis de la Broise, Silvia De Monte, Colomban de Vargas, Johan Decelle, Alan Deidun, Javier del Campo, Evelyne Derelle, Yves Desdevises, Elodie Desgranges, Valerie Desplanches, Floriane Despres, Nicolas Desreumaux, Rosanna di Mauro, Celine Dimier, John Dolan, Fabrizio D'Ortenzio, Francesco d'Ovidio, Anne Doye, Melissa Duhaime, Emilie Duperche, Xavier Durrieu de Madron, Stephanie Dutkiewicz, Karoline Faust, Janine Felden, Beatriz Fernandez, Isabel Ferrera, Regis Ferriere, Christine Ferrier-Pages, Mick Follows, Rainer Friedrich, Francoise Gaill, Alexandre Ganachaud, Laurence Garczarek, Josep M Gasol, Stephane Gasparini, Jean-Pierre Gattuso, Gabriella Gilkes, Jennifer Gillette, Silvia G. Acinas, Gabriel Gorsky, Brett Grant, Nigel Grimsley, Jean-Michel Grisoni, Michel Groc, Lionel Guidi, Cedric Guigand, Luis Gutierrez-Herredia, Roland Hellig, Pascal Hingamp, Danwei Huang, Julio Ignacio-Espinoza, Daniele Iudicone, Olivier Jaillon, Jean-Louis Jamet, Stefanie Kandels-Lewis, Lee Karp-Boss, Eric Karsenti, Michael Katinka, Yuko Kitano, Zbigniew Kolber, Philippe Koubbi, Uros Krzic, Hironobu Kukami , Karine Labadie, Pamela Labbe-Ibanez, Tomas Larsson, Alban Lazar, Herve Le Goff, Corinne Le Quere, Brian Leander, Philippe Lebaron, Noan LeBescot, Thomas Lefort, Louis Legendre, Cristophe Lejeusne, Cyrille Lepoivre, Magali Lescot, Mangan Lewis, Edouard Leymarie, Gipsi Lima-Mendez, Ramiro Logares, Frederic Mahe, Cornelia Maier, Shruti Malviya, Catarina Marcolin, Claudie Marec, Sophie Marinesque, Ramon Massana, Lydiane Mattio, Maria Grazia Mazzochi, Raphael Morard, Herve Moreau, Pascal Morin, Simon Morisset, David Mountain, Paul Muir, Harry Nelson, Sophie Nicaud, Paul Nival, Benjamin Noel, Fabrice Not, Grigor Obolensky, David Obura, Hiroyuki Ogata, Philippe Pages, Claude Payri, Javier Paz Yepes, Carlos Pedros-Alio, Eric Pelletier, Rainer Pepperkok, Fabien Perault, Yvan Perez, Stephane Pesant, Marc Picheral, Michel Pichon, Gwenael Piganeau, Ruby Pillay, Olivier Poirot, Julie Poulain, Nicole Poulton, Franck Prejger, Judit Prihoda, Ian Probert, Gabriele Procaccini, Jeroen Raes, Jeannine Rampal, Josephine Ras, Gilles Reverdin, Emmanuel G. Reynaud, Stephanie Reynaud, Francois Ribalet, Maurizio Ribera d'Alcala, Eric Roettinger, Sarah Romac, Jean-Baptiste Romagnan, Cecile Rottier, Francois Roullier, Christian Rouviere, Anne Royer, Marta Royo Llonch, Martina Sailerova, Guillem Salazar, Gaelle Samson, Sebastien Santini, Christian Sardet, Hugo Sarmento, Eleonora Scalco, Claude Scarpelli, Antoine Sciandra, Sarah Searson, Raffaele Siano, Mike Sieracki, Bianca Silva, Oleg Simakov, Sergei Solonenko, Sabrina Speich, Silvia Spezzaferri, Fabio Stalder, Fabrizio Stefani, Halldor Stefansson, Ernst Stelzer, Lars Stemmann, Lucie Subirana, Matt Sullivan, Shinichi Sunagawa, Jarred Swalwell, Vincent Taillandier, Eric Tambutte, Sylvie Tambutte, Atsuko Tanaka, Isabelle Taupier-Letage, Pierre Testor, Anne Thompson , Doris Thuillier, Virgine Tichanne-Seltzer, Leila Tirichine, Eve Toulza, Sasha Tozzi, Jean-Eric Tremblay, Aline Tribollet, Antoine Triller, Didier Velayoudon, Alaguraj Veluchamy, Emilie Villar, Flora Vincent, Carden Wallace, Markus Weinbauer, Jean Weissenbach, Maureen Williams, Patrick Wincker, Sheree Yau, Alexis Yelton, Adriana Zingone, Didier Zoccola.

    TO LEARN MORE ABOUT TARA OCEANS VISIT: http://embl.de/tara-oceans/