Agent Skills

Tf Target Gene Regulatory Network

AIPOCH

Use when analyzing transcription factor (TF) regulatory networks using Dorothea database. Input gene list, identify regulating transcription factors, generate TF-Target network visualization. For: transcription factor enrichment analysis, gene regulatory network research.

30
0
FILES
tf-target-gene-regulatory-network/
skill.md
scripts
functions.R
main.R
run_analysis.R
utils.R
visualization.R
references
algorithm.md
cli-guide.md
troubleshooting.md
visualization-parameters.md
94100Total Score
View Evaluation Report
Core Capability
96 / 100
Functional Suitability
12 / 12
Reliability
12 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
33 / 33 Passed
100Human gene list TF network analysis with default parameters
5/5
95Mouse gene list with first-letter-uppercase convention
5/5
95Gene list with no matching TF-target relationships in Dorothea
5/5
95Gene file input mode (--gene_file instead of --gene)
5/5
92Large gene list (>500 genes) with local Dorothea RDS database
5/5
85Pathway enrichment analysis request
4/4
85Mixed-species gene list (human and mouse symbols together)
4/4

SKILL.md

Transcription Factor (TF) Regulatory Network Analysis

When to Use

  • Use this skill when you have a human or mouse gene list and want to identify upstream TFs from the Dorothea database.
  • Use it when you need a ready-to-export TF-target network table plus a publication-ready PDF network plot.
  • Use it for reproducible CLI execution with saved session information and optional local Dorothea .rds databases.

When Not to Use

  • Do not use this skill for differential expression, pathway enrichment, cell type annotation, or survival analysis.
  • Do not use it to infer causal direction beyond curated Dorothea TF-target relationships.
  • Do not use it when your input genes are aliases or mixed-species symbols that have not been normalized first.

Input Validation

This skill accepts: a human or mouse gene list (HGNC symbols for human, first-letter-uppercase for mouse) for TF regulatory network analysis using Dorothea.

If the user's request does not involve identifying upstream transcription factors from a gene list — for example, asking to run differential expression, pathway enrichment, cell type annotation, or multi-omics integration — do not proceed with the workflow. Instead respond:

"tf-target-gene-regulatory-network is designed to identify upstream transcription factors from a gene list using the Dorothea database and generate a TF-target network visualization. Your request appears to be outside this scope. Please provide a gene list for TF regulatory analysis, or use a more appropriate tool for your task."

Entry Point

  • Primary CLI entry point: scripts/main.R
  • Canonical visualization values: English tokens fr, curve, diamond, triangle, square

When to Read External Files

SituationFile to ReadPurpose
Need algorithm detailsreferences/algorithm.mdStatistical methods, Dorothea database, network analysis algorithms
Need to run analysisscripts/main.RExecute: Rscript scripts/main.R --gene ... --species ...
Encounter errorsreferences/troubleshooting.mdCommon errors and solutions
Need CLI examplesreferences/cli-guide.mdDetailed CLI usage examples
Need test datatests/data/Sample gene lists for testing

Installation

R Package Dependencies

# CRAN packages
install.packages(c("optparse", "dplyr", "openxlsx", "tidyverse", "tidygraph", "ggraph", "showtext"))

# Bioconductor packages (optional if using local database files)
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("dorothea")

For faster analysis and offline use, generate local database files:

Rscript database/database-get.R

This creates database/dorothea_hs.rds (human) and database/dorothea_mm.rds (mouse) in the skill root directory.

Verification

Rscript scripts/main.R --help

Usage

Rscript scripts/main.R \
  --gene "TP53,MYC,EGFR" \
  --species human \
  --output_dir ./TF_Result \
  --seed 42

Arguments

Note: Either --gene or --gene_file must be provided (at least one is required).

ShortLongTypeDefaultDescription
-g--genecharacterNULLComma-separated gene list (e.g., "TP53,MYC,EGFR") — required if --gene_file not provided
-f--gene_filecharacterNULLFile with gene names (txt or csv, one per line or comma-separated) — required if --gene not provided
-s--speciescharacterhumanSpecies: human or mouse
-o--output_dircharacterTF_ResultOutput directory name
--db_pathcharacterNULLLocal .rds database file path. If not specified, auto-searches default paths
-d--dircharacterNULLWorking root directory (advanced)
--seedinteger42Random seed for reproducibility
--titlecharacter""Main plot title

Visualization Parameters: For complete list (plot dimensions, colors, labels, layout, edge styles), see references/visualization-parameters.md. Canonical values are English tokens: fr (force-directed layout), curve (curved edges), diamond / triangle / square (node shapes).


Input Format

Gene List Input

Two ways to provide input genes:

  1. Command line: --gene "TP53,MYC,EGFR" (comma-separated)
  2. File input: --gene_file genes.txt (one gene per line or comma-separated)

Gene Naming Convention

  • Human genes: All uppercase symbols (e.g., TP53, MYC, EGFR)
  • Mouse genes: First letter uppercase only (e.g., Tp53, Myc, Egfr)
  • Use official gene symbols, not aliases
  • Case-sensitive for species matching

Species Support

  • human: Human genes (Homo sapiens)
  • mouse: Mouse genes (Mus musculus)

Performance Guidance

For large gene lists (> 500 genes), Dorothea database queries may take several minutes. Use a local .rds database (--db_path) for substantially faster lookups. There is currently no --timeout_seconds parameter; monitor progress with verbose logging if available.


Output Files

FileDescription
TF_Network_Plot.pdfTF-target network visualization
tf_network.xlsxNetwork data (edges and nodes worksheets)
TF_Target_Filtered_Core_<species>.xlsxComplete TF-target relationships table
session_info.txtR session and package version info
tf.RdataR environment data

Outputs are organized under TF_Result/ in data/, plot/, and table/ subdirectories.


Workflow

Step 1: Load Database

  • Priority search for local database files (.rds format) in: --db_path, getwd()/database/, script_dir/database/, dirname(script_dir)/database/
  • If no local file found, load from Dorothea R package
  • Filter for high-confidence interactions (confidence levels A, B, C)

Step 2: Identify Regulating TFs

  • Match input genes against target genes in Dorothea database
  • Extract transcription factors regulating these targets
  • Compute TF frequency (number of targets regulated)

Step 3: Generate Network Data

  • Create edge list (TF → Target relationships)
  • Create node list with types (TF or Target)
  • Save to Excel format for downstream analysis

Step 4: Visualize Network

  • Generate network graph using tidygraph and ggraph
  • Apply visual customization (layout, colors, shapes)
  • Save as PDF publication-ready figure

Methods

Dorothea Database

Network Analysis

  • Graph construction: TF-target relationships as directed edges
  • Layout algorithms: Multiple options including fr (force-directed), circle, grid, sphere
  • Visual customization: Full control over colors, shapes, sizes

Local Database Feature

  • Option to use pre-saved .rds files for faster analysis and offline use
  • Priority search paths for local database files (see Step 1 above)
  • Fallback to Dorothea R package if no local file found

Examples

Basic Usage (Human Genes)

Rscript scripts/main.R \
  -g "TP53,MYC,EGFR" \
  -s human \
  -o ./TF_Result

File Input

Rscript scripts/main.R \
  -f gene_list.txt \
  -s human \
  -o ./TF_Result

Mouse Genes

Rscript scripts/main.R \
  -g "Tp53,Myc,Egfr" \
  -s mouse \
  -o ./Mouse_TF_Result

Custom Styling

Rscript scripts/main.R \
  -g "PTPRC,FOXP3,CD4" \
  -s human \
  --style_layout "fr" \
  --style_line "curve" \
  --point_shape "diamond,triangle" \
  --line_color "#E64B35" \
  --title "Immune TF Network" \
  -o ./Custom_Plot

Using Local Database

Rscript scripts/main.R \
  -g "TP53,MYC,EGFR" \
  -s human \
  --db_path database/dorothea_hs.rds \
  -o ./LocalDB_Result

Error Handling

Common Errors

Error CodeCauseSolution
SKILL_FILE_NOT_FOUNDInput gene file does not existCheck file path and permissions
SKILL_NO_INPUT_GENESEmpty gene list or fileProvide genes using --gene or --gene_file
SKILL_INVALID_SPECIESSpecies not human or mouseUse human or mouse only
SKILL_INVALID_PARAMETERInvalid layout, shape, line, or legend valueUse supported values shown by Rscript scripts/main.R --help
SKILL_EMPTY_RESULTSNo TF-target relationships found for input genesCheck gene symbols and species; try broadening confidence levels
SKILL_DEPENDENCY_MISSINGMissing dplyr, dorothea, tidygraph, etc.Install missing packages (see Installation section)

Exit Status Codes

CodeMeaning
0Success
1Execution error (see error code for details)
2SKILL_EMPTY_RESULTS — no TF-target matches found for the input genes

IF error persists, READ: references/troubleshooting.md


Testing

Test with Sample Data

# Check help
Rscript scripts/main.R --help

# Run with sample human genes
Rscript scripts/main.R \
  -g "TP53,MYC,EGFR" \
  -s human \
  -o tests/output_human/

# Run with sample mouse genes
Rscript scripts/main.R \
  -g "Tp53,Myc,Egfr" \
  -s mouse \
  -o tests/output_mouse/

After running, verify tests/output_human/plot/TF_Network_Plot.pdf and tests/output_human/table/tf_network.xlsx exist and are non-empty.

Reference Files

FilePurpose
references/algorithm.mdStatistical methods and Dorothea database details
references/troubleshooting.mdCommon errors and solutions
references/cli-guide.mdCLI usage examples
references/visualization-parameters.mdComplete visualization parameter list