Agent Skills

Estimate Immune Score Analysis

AIPOCH

Use this skill to compute ESTIMATE immune-related microenvironment scores from a bulk expression matrix, generate an ESTIMATE score heatmap, and optionally generate group-wise ESTIMATE score boxplots plus significance tables when a sample group file is supplied. Trigger keywords: ESTIMATE, immune score, stromal score, tumor microenvironment score. NOT for: immune cell deconvolution, single-cell analysis, differential expression, clinical diagnosis.

28
0
FILES
estimate-immune-score-analysis/
skill.md
scripts
cli_options.R
functions.R
io.R
main.R
recording.R
run_analysis.R
utils.R
visualization.R
references
algorithm.md
cli-guide.md
troubleshooting.md
97100Total Score
View Evaluation Report
Core Capability
99 / 100
Functional Suitability
12 / 12
Reliability
12 / 12
Performance & Context
8 / 8
Agent Usability
16 / 16
Human Usability
8 / 8
Security
11 / 12
Maintainability
12 / 12
Agent-Specific
20 / 20
Medical Task
22 / 22 Passed
97Basic ESTIMATE scoring with heatmap on GeneSymbol CSV matrix
5/5
94Grouped comparison with boxplot and significance table (Tumor vs Healthy)
5/5
95Group file with 2 samples total (1 per group) — below minimum threshold
4/4
96TSV input with Entrez ID gene identifiers and custom output directory
4/4
94Large matrix with 3-group file and 300-second timeout
4/4

SKILL.md

ESTIMATE Immune Score Analysis

When to Use

Use this skill when the user wants to:

  • compute ESTIMATE-derived immune and stromal scores from a bulk expression matrix
  • transform an expression matrix into estimate package input files and score outputs
  • generate an ESTIMATE score heatmap across samples
  • compare ESTIMATE scores across sample groups when a sample group file is available
  • create a reproducible CLI-backed ESTIMATE workflow with structured output records

Typical request patterns:

  • "Run ESTIMATE immune score analysis on this expression matrix"
  • "Calculate ImmuneScore and StromalScore from my bulk RNA-seq data"
  • "Generate ESTIMATE scores and save a sample-level result table"

Execution Model

This is a CLI-backed analysis skill.

  1. Use SKILL.md to confirm that the task is ESTIMATE score generation from bulk expression data.
  2. Use scripts/main.R for the real execution.
  3. Provide one expression matrix file with genes in the first column and samples in the remaining columns.
  4. Optionally provide a sample group file to generate ESTIMATE score boxplots and a significance summary table.
  5. The workflow always generates an ESTIMATE score heatmap from the computed score table.
  6. Read reference files only when you need algorithm details, troubleshooting, or baseline execution notes.

When to Read External Files

SituationFile to ReadPurpose
Need algorithm detailsreferences/algorithm.mdUnderstand the ESTIMATE scoring workflow and result interpretation
Need to run the skillscripts/main.RExecute the CLI entry point
Encounter errorsreferences/troubleshooting.mdFind standard error codes and fixes
Need more CLI examples or the real-data baseline recordreferences/cli-guide.mdCopy commands and review the recorded execution template
Need sample input filestests/data/Use the bundled demo expression matrix

When Not to Use

  • Immune cell fraction estimation: use a CIBERSORT-like deconvolution workflow instead
  • Differential testing between biological groups: use a differential analysis skill instead
  • Single-cell analysis: use a single-cell-specific workflow
  • Clinical diagnosis or treatment decision support: do not use this skill

If the request is outside ESTIMATE score generation for bulk expression matrices, stop and explain that this skill only covers ESTIMATE-based score computation.

Input Validation

This skill accepts:

  • one bulk expression matrix in CSV or TSV format with genes in the first column and samples in the remaining columns
  • an optional sample group file in CSV or TSV format for grouped boxplots and significance testing
  • requests to compute ESTIMATE-derived StromalScore, ImmuneScore, ESTIMATEScore, TumorPurity, and related visualizations from bulk transcriptomic data

Do not use this workflow for:

  • single-cell RNA-seq or spatial transcriptomics
  • immune cell deconvolution requests
  • direct clinical diagnosis, treatment recommendation, or patient-level medical decision making
  • unrelated tasks such as literature writing, web scraping, or generic plotting without ESTIMATE score generation

If the user's request is outside this scope, do not proceed with the workflow. Instead respond:

estimate-immune-score-analysis is designed to compute ESTIMATE-based tumor microenvironment scores from a bulk expression matrix. Your request appears to be outside this scope. Please provide a valid bulk expression matrix and, if needed, a matching sample group file, or use a more appropriate skill for your task.

Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./group_info.csv \
  --output_dir ./output \
  --gene_id_type GeneSymbol \
  --platform affymetrix \
  --seed 42

Arguments

ShortLongTypeDefaultDescription
-i--input_filecharacterrequiredExpression matrix file in CSV or TSV format
-o--output_dircharacter./outputOutput directory
--group_filecharacteroptionalSample group file used for ESTIMATE score boxplots and significance testing
-g--gene_id_typecharacterGeneSymbolGene identifier type: GeneSymbol or EntrezID
-p--platformcharacteraffymetrixESTIMATE platform: affymetrix, agilent, or illumina
-s--seedinteger42Random seed
-t--timeout_secondsinteger0Optional timeout in seconds; 0 disables timeout
--input_delimitercharacterautoInput delimiter hint: auto, csv, or tsv
--group_delimitercharacterautoGroup file delimiter hint: auto, csv, or tsv
--sample_columncharactersampleSample column name in the group file
--group_columncharactergroupGroup column name in the group file
--plot_filecharacterestimate_scores_boxplot.pdfBoxplot file name written under plot/
--heatmap_filecharacterestimate_scores_heatmap.pdfHeatmap file name written under plot/

Input Format

  • CSV or TSV file
  • First column contains gene identifiers
  • Remaining columns are sample names
  • Expression values must be numeric and non-missing
  • Sample column names must be unique; duplicate sample column names raise SKILL_INVALID_PARAMETER

Example:

gene,S1,S2,S3
TP53,8.1,7.9,6.5
EGFR,5.2,5.0,4.2

The bundled tests/data/expression_matrix.csv was copied from cibersort-immune-infiltration-analysis/tests/data/expression_matrix.csv for demo and validation use.

Optional Group File

  • CSV or TSV file
  • Must contain one sample column and one group column
  • Sample names must match the ESTIMATE score table sample IDs
  • Exactly two group levels are supported for boxplot comparison. If more than two groups are present in the group file, SKILL_INVALID_PARAMETER is raised.
  • Each group must contain at least 3 samples for valid statistical testing. Groups with fewer samples trigger SKILL_INVALID_PARAMETER.
  • If the group file is provided but grouped comparison fails after core scoring, the command exits with a SKILL_* error after preserving the core ESTIMATE outputs and failure records

Example:

sample,group
S1,Tumor
S2,Tumor
S3,Healthy
S4,Healthy

Output Files

FileDescription
data/expression_input.tsvTab-delimited expression matrix prepared for ESTIMATE
data/estimate_input.gctGCT file created by estimate::filterCommonGenes()
data/estimate_score.gctRaw ESTIMATE score output from estimate::estimateScore()
table/estimate_scores.tsvReformatted sample-by-score table
plot/estimate_scores_heatmap.pdfSample-level ESTIMATE score heatmap
table/estimate_score_group_stats.csvPer-score p-values and the group with the higher median score when --group_file is provided
plot/estimate_scores_boxplot.pdfESTIMATE score boxplot when --group_file is provided
session_info.txtR session and package version information
output_manifest.txtAppend-only output file manifest with descriptions
run_record.txtAppend-only run record with parameters, runtime, and output summary

Workflow

Step 1: Validate Input

  • Confirm the input file exists
  • Confirm the matrix contains at least one gene column and one or more sample columns
  • Confirm all expression columns are numeric and sample names are unique

Step 2: Run ESTIMATE

  • Convert the matrix to a tab-delimited file with the selected gene identifier header
  • Run estimate::filterCommonGenes()
  • Run estimate::estimateScore()

Step 3: Export Results

  • Save the raw GCT outputs under data/
  • Reformat the score matrix into table/estimate_scores.tsv
  • Create plot/estimate_scores_heatmap.pdf
  • If --group_file is supplied, create plot/estimate_scores_boxplot.pdf
  • If --group_file is supplied, create table/estimate_score_group_stats.csv
  • If grouped comparison fails after core scoring, keep the core outputs, append failure details to output_manifest.txt and run_record.txt, and exit with a SKILL_* message
  • Save session_info.txt
  • Append a run section to output_manifest.txt and run_record.txt

Examples

Basic Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --output_dir ./output

Grouped Comparison

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./group_info.csv \
  --output_dir ./grouped_output

TSV Input

Rscript scripts/main.R \
  --input_file ./expression_matrix.tsv \
  --input_delimiter tsv \
  --output_dir ./tsv_output \
  --gene_id_type GeneSymbol

Alternate Platform

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --output_dir ./illumina_output \
  --platform illumina \
  --seed 123

For the real-data baseline execution record, READ: references/cli-guide.md

Error Handling

Error CodeMeaningSolution
SKILL_FILE_NOT_FOUNDInput file is missing or an expected intermediate file was not createdCheck file paths and rerun
SKILL_MISSING_COLUMNSThe gene identifier column contains missing valuesRepair the first column and rerun
SKILL_EMPTY_DATAThe matrix or ESTIMATE output is emptyVerify input content and identifier compatibility
SKILL_INVALID_PARAMETERA CLI argument is unsupported; the matrix contains invalid values; duplicate sample column names detected; more than two group levels provided; or a group contains fewer than 3 samplesReview arguments and input values
SKILL_SAMPLE_MISMATCHSample names in the group file do not overlap the ESTIMATE score tableAlign sample IDs before rerunning
SKILL_PACKAGE_NOT_FOUNDRequired R packages are not installedInstall missing packages listed in references/cli-guide.md

If the error persists, READ: references/troubleshooting.md

For optional group comparison failures such as SKILL_SAMPLE_MISMATCH, inspect the preserved core outputs together with output_manifest.txt and run_record.txt to see what completed before the grouped step failed.

Testing

Rscript scripts/main.R --help

Rscript tests/run_tests.R

Rscript scripts/main.R \
  --input_file tests/data/expression_matrix.csv \
  --group_file tests/data/group_info.csv \
  --output_dir tests/output \
  --gene_id_type GeneSymbol \
  --platform affymetrix \
  --seed 42

Expected outputs:

  • tests/output/data/expression_input.tsv
  • tests/output/data/estimate_input.gct
  • tests/output/data/estimate_score.gct
  • tests/output/table/estimate_scores.tsv
  • tests/output/plot/estimate_scores_heatmap.pdf
  • tests/output/table/estimate_score_group_stats.csv
  • tests/output/plot/estimate_scores_boxplot.pdf
  • tests/output/session_info.txt
  • tests/output/output_manifest.txt
  • tests/output/run_record.txt

Optional post-check:

Rscript tests/test_skill.R tests/output

References

  1. Yoshihara K, Shahmoradgoli M, Martinez E, et al. (2013) Inferring tumour purity and stromal and immune cell admixture from expression data. Nature Communications. doi:10.1038/ncomms3612

For detailed algorithm notes, READ: references/algorithm.md

Implementation Checklist

  • CLI parsing with optparse
  • set.seed() for reproducibility
  • Only public CRAN/Bioconductor packages used
  • Script parameters documented in SKILL.md
  • get_script_dir() defined before any call to it
  • File reading instructions in SKILL.md
  • Test data provided in tests/data/
  • Error handling implemented with SKILL_* messages
  • Baseline record completed in references/cli-guide.md
  • skill-auditor outputs generated after container execution