Links
For a description about the approach, go to description.
For a step-by-step description to detect the regulons, go to quick start.
The iRegulon plugin can be used to answer several questions, see Application examples presented in our tutorial.
For additional questions, please consult the FAQ.
General Description
A regulon consists of a transcription factor (TF) and its direct transcriptional targets, which
contain common TF binding sites in their cis-regulatory control elements. The iRegulon plugin
allows you to identify regulons using motif discovery in a set of co-regulated genes.
The cytoscape plugin works as a java client connected to the server-side daemon over the
internet. The iRegulon server-side daemon is implemented in Python and uses MySQL to store and
query the motif-based whole-genome rankings (see below). After submitting a gene set or network
to the service, the results are returned to the client (this happens on-the-fly, and takes about
one minute). The user can browse through the motif discovery results, select a TF among the
prioritized list of TFs, and add upstream regulators and direct regulator-target 'edges' to the
input gene set or network under study.
The prediction of regulons consists of the following steps.
- (A) Motif detection, consisting of two parts. The first part is the offline
scoring of a sequence search space (up to 20kb around the TSS) around every gene in the
human genome using a Hidden Markov Model, detecting homotypic motif clusters (cluster-buster).
This is done for nearly ten thousand candidate motifs (position weight matrices or PWMs), resulting in a
gene-ranking for each PWM. This process is repeated for the orthologous sequences in nine
other vertebrate genomes, followed by the integration of the cross-species rankings using
rank aggregation. The second part of motif detection is the on-the-fly
identification of those motifs for which the input genes are enriched at the top of the
ranking, using the Area Under the Curve (AUC) of the cumulative recovery curve. Enriched
motifs are those with a high AUC compared to the average AUC of all motifs, and enrichment
is measured by a normalized enrichment score (NES).
- (B) Track discovery has been added in the recent version (1.2). It allows detecting TFs using gene rankings according to the highest ChIP peak within the regulatory space using more than one thousand ChIP-Seq tracks.
- (C) Motif2TF mapping: the prioritization of candidate TFs that could bind
to the enriched motifs. This is achieved by finding the optimal path from a motif to a TF,
in a motif-TF network where the edges consist of motif2motif similarity, TF2TF orthology,
and motif2TF annotation. Note that the predicted upstream TF can either or not be part of
the co-expressed gene set, while the downstream targets are all part of the input gene set.
- (D) Target detection: The determination of the optimal subset of direct
target genes, namely the significantly highly ranked genes compared to the genomic
background and to the entire motif collection as background.
The final output is thus a list of enriched motifs/tracks, alongside with candidate transcription factors, and for each motif/track a set of direct target genes. New networks can be automatically generated based on the predicted TF-target interactions.
Supported organisms are human, mouse and fly.
The iRegulon plugin can also be used to query a TF-targets database made of high-confidence target genes predicted from the systematic analysis of thousands of cancer gene signatures.
For more detailed description about the approach, we refer to the paper.