Help


General Description

A regulon consists of a transcription factor (TF) and its direct transcriptional targets, which contain common TF binding sites in their cis-regulatory control elements. The iRegulon plugin allows you to identify regulons using motif discovery in a set of co-regulated genes.

The cytoscape plugin works as a java client connected to the server-side daemon over the internet. The iRegulon server-side daemon is implemented in Python and uses MySQL to store and query the motif-based whole-genome rankings (see below). After submitting a gene set or network to the service, the results are returned to the client (this happens on-the-fly, and takes about one minute). The user can browse through the motif discovery results, select a TF among the prioritized list of TFs, and add upstream regulators and direct regulator-target 'edges' to the input gene set or network under study.

The prediction of regulons consists of the following steps.

  • (A) Motif detection, consisting of two parts. The first part is the offline scoring of a sequence search space (up to 20kb around the TSS) around every gene in the human genome using a Hidden Markov Model, detecting homotypic motif clusters (cluster-buster). This is done for nearly ten thousand candidate motifs (position weight matrices or PWMs), resulting in a gene-ranking for each PWM. This process is repeated for the orthologous sequences in nine other vertebrate genomes, followed by the integration of the cross-species rankings using rank aggregation. The second part of motif detection is the on-the-fly identification of those motifs for which the input genes are enriched at the top of the ranking, using the Area Under the Curve (AUC) of the cumulative recovery curve. Enriched motifs are those with a high AUC compared to the average AUC of all motifs, and enrichment is measured by a normalized enrichment score (NES).
  • (B) Track discovery has been added in the recent version (1.2). It allows detecting TFs using gene rankings according to the highest ChIP peak within the regulatory space using more than one thousand ChIP-Seq tracks.
  • (C) Motif2TF mapping: the prioritization of candidate TFs that could bind to the enriched motifs. This is achieved by finding the optimal path from a motif to a TF, in a motif-TF network where the edges consist of motif2motif similarity, TF2TF orthology, and motif2TF annotation. Note that the predicted upstream TF can either or not be part of the co-expressed gene set, while the downstream targets are all part of the input gene set.
  • (D) Target detection: The determination of the optimal subset of direct target genes, namely the significantly highly ranked genes compared to the genomic background and to the entire motif collection as background.

The final output is thus a list of enriched motifs/tracks, alongside with candidate transcription factors, and for each motif/track a set of direct target genes. New networks can be automatically generated based on the predicted TF-target interactions.

Supported organisms are human, mouse and fly.

The iRegulon plugin can also be used to query a TF-targets database made of high-confidence target genes predicted from the systematic analysis of thousands of cancer gene signatures.

For more detailed description about the approach, we refer to the paper.