Description of the input parameters

Two queries are available from iRegulon plugin with their corresponding parameters discussed here in this manual.

 
Warning! Notice that if a parameter is invalid, the background of the textfield will be red, and the submit-button will be disabled.
 

General Parameters

Job Name

Enter a name for your analysis, it will be used to name your results and will be suggested for saving the results.

Species and Gene Nomenclature

This is the species and the gene nomenclature. All the possible species and there nomenclature that can be used is shown here. Currently these are:

  • Homo sapiens, HGNC Symbols
  • Mus musculus, MGI Symbols
  • Drosophila melanogaster, FlyBase names

Query Transcription Factor (TF)

Select a TF in a list of 886 TFs with precalculated TF-targets (metaregulons).

Node Selection Parameters

Attribute Name

Choose the node attribute that represents the gene name. The selected attribute must be in a supported format (see Species & Gene Nomenclature).

Number of Valid Genes (Nodes)

This is the number of selected nodes with valid gene names. If it is zero, it means that the user forgot to select the nodes before using the query panel.

Ranking Parameters

Search Space Type

This parameter allows to define the type of the regulatory search space from:

  • Gene-Based: the putative regulatory regions are defined according to the boundaries of the genes (TSS or TTS)
  • Region-Based: the putative regulatory regions are defined by regulatory features such as promoter regions, DHS regions, and other non coding annotated regions

Motif and Track Collections

If several motif/track collections are available, use this parameter to select one motif/track collection. More details about our collections here.

Putative Regulatory Region

This parameter delineates in more detail the search space, i.e. the putative regulatory region.

Motif/Track Ranking Database

According to the type of the search space and the putative regions, several databases can be queried:

  • in the gene-based search space, it is possible to use conservation between 7 or 10 species for Motif rankings databases, but not for Track rankings,
  • in the region-based seach space, available for Drosophila, 136K regulatory non-coding regions using conservation between 11 species (as described in our i-cisTarget paper).

TF-target Database (Metaregulon)

This parameter allows the user to query the metaregulons across the results of the run of iRegulon on several databases of known signatures:

  • GeneSigDB: 3447 signatures annotated in GeneSigDB (version 4)
  • MSigDB: 6753 signatures annotated in MSigDB (version 3 collection 2)
  • Ganesh clusters: 12972 bi-clusters obtained in-house. Bi-clustering was performed with Ganesh clustering algorithm using default settings to 91 microarray datasets. Normalized (fRMA) microarray data was obtained from InsilicoDB.

You can select one or multiple databases.

Heads Up! iRegulon was run with the following parameters: gene-based approach, database of 10kb centered around the TSS (10 species), motif collection v3 (6383 PWMs), NES threshold of 3.18, ROC threshold of 3% and Rank threshold of 3000.

Region-based Parameters

Heads Up! To convert your input genes in input regions, you need to overlap the putative regulatory regions from our database with the putative regions of your input genes.

The three following parameters are specific to region-based databases (i.e., not available for gene-based search space).

Overlap Fraction

This is the fraction of the putative regulatory region associated with a gene that must overlap with the predefined regions. This parameter must be between 0 and 1.

Upstream region

You can select a predefined regulatory search space or specify here the size of the region (in bp) upstream of the TSS to use in the mapping to predefined regions.

Downstream region

You can select a predefined regulatory search space or specify here the size of the region (in bp) downstream of the TSS to use in the mapping to predefined regions.

Recovery Prediction Parameters

Enrichment Score threshold

This is the minimal NES score to consider a motif as being relevant. The default score threshold is set to 3. This score corresponds to a FDR on the TF recovery between 3% and 9% (when validated on ENCODE ChIP-seq datasets using different regulatory search spaces).

Threshold for the AUC calculation

The Area Under the Curve (AUC) values are calculated for all motifs at the beginning of the cumulative gene recovery plot (aka ROC curve) which plots the input gene recovery along the whole genome ranking. This threshold indicates the percentage of the top ranked genes/regions to consider for the AUC calculation (3% => 420 genes in the gene based ranking).

Threshold for visualisation

This is the x-axis cutoff for visualization of the ROC curve (see above). This value corresponds with the top genes shown on the results.

TF Prediction Parameters (motif2TF)

Minimum Orthologous Identity

This is a threshold on the miniminal identity score to define gene orthology. This %identity was calculated in EnsemblCompara gene trees based on whole amino acid sequence alignments (tf2tf associations). The closer the score to zero, the more homologous genes can be associated to an annotated TF. But when the threshold is set to one, no orthologous information is used. This score must be between 0 and 1.

Maximum Motif Similarity FDR

This is a threshold on the maximal FDR calculated by the TOMTOM p-value for the similarity of the motifs (motif2motif associations). The closer the score to zero, the more similar motifs will be selected for association to a enriched motif. But when the threshold is set to zero, no motif similarity information is used. The score must be between 0 and 1.

Metatargetome Prediction Parameters

Occurrence Count threshold

This is a threshold on the minimum occurrence of the target genes. In a meta-targetome, each target gene is annotated with a number that represents the number of gene sets where the TF is found enriched and the gene is among the optimal subset of direct targets.

Maximum Number Nodes threshold

This is a threshold on the approximate number of targets to be displayed for a calculated metatargetome. The targets with the same score of the target at the threshold will be also displayed.

Create Network

If the user tick this option, a new network will be created in cytoscape. Otherwise, the network will be added to the current window.