CLubbeR is an automated cluster load balancing system designed specifically to facilitate and accelerate common computational biology experimental workflows and used in conjunction with existing methods or scripts to efficiently process large-scale datasets. The method was developed by Maximillian Miller (BrombergLabRutgers University and Rostlab @ Technical University of Munich).

GIT repository containing all sources

Docker container for clubber in the Docker Store

The fastest and easiest way to use clubber is to simply run the bromberglab/clubber:latest docker image from within docker. Which will be automatically retrieved from the docker cloud if not available locally:

docker run -d -p 80:80 bromberglab/clubber:latest

CLubbeR's plug-and-play framework encourages re-use of existing solutions for bioinformatics problems. CLubbeR’s goals are to reduce computation times and to facilitate use of cluster computing. The first goal is achieved by automating the balance of parallel submissions across available high performance computing (HPC) resources. Notably, the latter can be added on demand, including cloud-based resources, and/or featuring heterogeneous environments. The second goal of making HPCs user-friendly is facilitated by an interactive web interface and a RESTful API allowing for job monitoring and result retrieval. We used CLubbeR to speed up our pipeline for annotating molecular functionality of metagenomes. We analyzed the Deepwater Horizon oil-spill study data to quantitatively show that that the beach sands have not yet entirely recovered. Further, our analysis of the CAMI-challenge data revealed that microbiome taxonomic shifts do not necessarily correlate with functional shifts. These examples (21 metagenomes processed in 172 minutes) clearly illustrate the importance of clubber in the everyday computational biology environment.



Access Online

mi-faser is a metagenome (metagenome/metatranscriptome) analysis tool developed by Chengsheng Zhu in the BrombergLab @ Rutgers University. The method was significantly sped up and made publicly available by Maximillian Miller (BrombergLabRutgers University and Rostlab @ Technical University of Munich).

mi-faser combines faser (functional annotation of sequencing reads), an algorithm that maps reads to molecular functions encoded by the read-correspondent genes, with a manually curated reference database of protein functions. As our method is optimized for short reads, no pre-assembly is required -- just submit your raw (but quality controlled) raw data AS IS.

As output, mi-faser produces high precision sets of molecular functions identified in the microbiome sequence data. mi-faser's minutes-per-microbiome processing speed is significantly faster than that of other annotation methods (less than 20min/10GB of reads), allowing for large scale comparisons. For instance, microbiome function vectors can be compared between different conditions or time-steps to highlight environment-specific and/or time-dependent changes in functionality.

Note that although mi-faser is specifically targeted to microbiome analysis, it could also potentially be used for the analysis of unassembled bacterial genome data as well.



Access Online

Fusion is a method for classifying microorganisms based on their functional similarities. It was developed by Chengsheng Zhu at the Bromberg Lab, at Rutgers University, New Jersey and made publicly available by Yannick Malich.

Correctly identifying nearest “neighbors” of a given microorganism is important in industrial and clinical applications, where close relationships imply similar treatment. Today, prokaryotic taxonomy relies heavily on phylogenetics. However, evolutionary relatedness, inferred from phylogenetic markers, does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Via comparison of all the molecular functions encoded in genomes of different microbes, we built Fusion, a novel microorganism classification network, in which two organisms (nodes) are connected with a edge (edge weight is their functional similarity). Fusion uses phenetic comparisons, providing a consistent and quantitative metric for classification. It is independent of the arbitrary pairwise organism similarity cutoffs traditionally applied to establish taxonomic identity. It is also more robust in dealing with data availability biases. Fusion defined organism clusters can be adjusted in size via resolution controls to meet specific research purposes. This dynamic feature of Fusion makes it capable of accommodating newly sequenced organisms. In addition, Fusion highlights the environmental factor for observed microorganism diversification with corresponding key functions. We believe Fusion will be a more practical choice for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment.

Fusiondb a novel database that uses our functional data to represent 1,374 taxonomically distinct bacteria annotated with available metadata: habitat/niche, preferred temperature, and oxygen use. Users can search fusionDB via combinations of organism names and metadata. Moreover, the web interface allows mapping new microbial genomes to the functional spectrum of reference bacteria, rendering interactive similarity networks that highlight shared functionality.

Fusion online data1 (organism similarity)

Fusion online data2 (module assignments)

The data and scripts below are provided for review purposes and are referenced in the submitted manuscript. These are sufficient to reproduce the Fusion network as described in the manuscript. See ReadMe.txt below for more detailed information. Contact with any questions.

Protein sequences from 1374 bacteria genomes

Gephi input to produce Fusion

Mapping from the function clusters to protein GI IDs

Python2 scripts for data processing


fusionDB example (Synechococcus bacterium, freshwater)


fusionDB ISMB2017 Supporting Online Material

fusionDB ISMB2017 Supporting Online Material - Table 3

fusionDB ISMB2017 Supporting Online Material - Table 4

fusionDB ISMB2017 Supporting Online Material - Table 5



Access Online

The type III secretion system is one of the causes of a wide range of bacterial infections in human, animals and plants. This system comprises a hollow needle-like structure localized on the surface of bacterial cells that injects specific bacterial proteins, the so-called effectors, directly into the cytoplasm of a host cell. During infection, effectors convert host resources to their advantage and promote pathogenicity.

We - Tatyana Goldberg, Burkhard Rost and Yana Bromberg - at BrombergLab and RostLab developed a novel method, pEffect that predicts bacterial type III effector proteins. In our method, we combine sequence-based homology searches (through PSI-BLAST) with advanced machine learning (Profile Kernel Support Vector Machines) to accurately predict effector proteins. We use information encoded in the entire protein sequence for our predictions.

Proteome predictions, data sets used and the standalone version of pEffect



Access Online

SNPdbe is a database of single amino acid substitutions and their predicted and experimentally derived functional effects. It was developed by Christian Schaefer at the Rost Lab in Technical University of Munich (TUM) under the supervision of Dr.Burkhard Rost and Dr. Yana Bromberg Most single amino acid substitutions (SAASs) lack experimental annotation of their functional impact. SNPdbe is a database and a webinterface that is designed to fill the annotation gap left by the high cost of experimental testing for functional significance of protein variants. It joins related bits of knowledge, currently distributed throughout various databases, into a consistent, easily accessible, and updatable resource. We currently cover over 155,000 protein sequences from more than 2,600 organisms. Overall we reference more than one million SAASs consisting of natural variants (1000 Genomes data, dbSNP, etc) and SAASs from mutagenesis experiments (SwissProt, PMD, etc). We also report sequence mismatches resulting from differing sequence reports for the same gene. For each SNPdbe entry all available experimental and prediction data is reported and available for download with a click of a button. Read more about SNPdbe here.



Access Online

SNAP is a method for evaluating effects of single amino acid substitutions on protein function. It was developed by Yana Bromberg at the Rost Lab, at Columbia University, New York. Single Nucleotide Polymorphisms (SNPs) represent a very large portion of all genetic variations. SNPs found in the coding regions of genes are often non-synonymous, changing a single amino acid in the encoded protein sequence. SNPs are either "neutral" in the sense that the resulting point-mutated protein is not functionally discernible from the wild-type, or they are "non-neutral" in that the mutant and wild-type differ in function. The ability to identify non-neutral substitutions in an ocean of SNPs could significantly aid targeting disease causing detrimental mutations, as well as SNPs that increase the fitness of particular phenotypes. SNAP is a neural-network based method that uses in silico derived protein information (e.g. secondary structure, conservation, solvent accessibility, etc.) in order to make predictions regarding functionality of mutated proteins. The network takes protein sequences and lists of mutants as input, returning a score for each substitution. These scores can then be translated into binary predictions of effect (present/absent) and reliability indices (RI). SNAP returns its results to the user via e-mail.

Read more about SNAP

Install SNAP locally