Data Analysis

estimate-immune-score-analysis

Use this skill to compute ESTIMATE immune-related microenvironment scores from a bulk expression matrix, generate an ESTIMATE score heatmap, and optionally generate group-wise ESTIMATE score boxplots plus significance tables when a sample group file is supplied.

97100Total Score

Core Capability

99 / 100

Functional Suitability

12 / 12

Reliability

12 / 12

Performance & Context

8 / 8

Agent Usability

16 / 16

Human Usability

8 / 8

Security

11 / 12

Maintainability

12 / 12

Agent-Specific

20 / 20

Medical Task

22 / 22 Passed

97Basic ESTIMATE scoring with heatmap on GeneSymbol CSV matrix

5/5

94Grouped comparison with boxplot and significance table (Tumor vs Healthy)

5/5

95Group file with 2 samples total (1 per group) — below minimum threshold

4/4

96TSV input with Entrez ID gene identifiers and custom output directory

4/4

94Large matrix with 3-group file and 300-second timeout

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No fabricated DOI, PMID, p-values, or clinical data across all 5 inputs; all values derived from actual ESTIMATE pipeline computation.
Practice Boundaries	PASS	Explicit disclaimer that skill is not for clinical diagnosis or treatment decisions; When Not to Use section enforces this boundary.
Methodological Ground	PASS	ESTIMATE algorithm correctly applied for TME scoring; no methodological fallacies detected; hard stops for statistically invalid configurations (< 3 samples per group) are correct defensive design.
Code Usability	PASS	All R scripts syntactically valid; dependencies (estimate, pheatmap, ggplot2, ggpubr) are standard CRAN/Bioconductor packages; modular sourcing pattern is correct.

Core Capability99 / 100 — 8 Categories

Functional Suitability

All promised use cases covered: ESTIMATE scoring, heatmap, grouped boxplot, significance table, TSV/CSV/EntrezID support.

12 / 12

100%

Reliability

Hard stops on invalid input; error captured at all levels; partial outputs preserved on grouped comparison failure; Scene Override criteria correctly applied.

12 / 12

100%

Performance & Context

Progressive disclosure via when-to-read table; conditional package loading deferred until group file is confirmed; no context bloat.

8 / 8

100%

Agent Usability

Clear execution model, step-by-step workflow, explicit when-to-read table, output manifest + run record, and preventive guidance for duplicate sample names.

16 / 16

100%

Human Usability

Trigger keywords listed in description; natural request patterns provided; strict validation correct per Scene Override; error messages clearly identify issues.

8 / 8

100%

Security

No hardcoded secrets; file path validation present; missing explicit privacy advisory for user expression data (unlike GSVA sibling skill).

11 / 12

92%

Maintainability

Clean 7-module R architecture; run_tests.R and test_skill.R provide full test coverage; expected outputs documented.

12 / 12

100%

Agent-Specific

Trigger precision excellent with NOT-for exclusions; progressive disclosure well-layered; SKILL_* codes parseable for agent retry; append-only manifests ensure idempotency; clear escape hatches.

20 / 20

100%

Core Capability Total99 / 100

Medical TaskExecution Average: 95.2 / 100 — Assertions: 22/22 Passed

Canonical

Basic ESTIMATE scoring with heatmap on GeneSymbol CSV matrix

5/5 ✓

Variant A

Grouped comparison with boxplot and significance table (Tumor vs Healthy)

5/5 ✓

Edge

Group file with 2 samples total (1 per group) — below minimum threshold

4/4 ✓

Variant B

TSV input with Entrez ID gene identifiers and custom output directory

4/4 ✓

Stress

Large matrix with 3-group file and 300-second timeout

4/4 ✓

Canonical✅ Pass

Basic ESTIMATE scoring with heatmap on GeneSymbol CSV matrix

Full pipeline: validate -> filterCommonGenes -> estimateScore -> heatmap + score table. All expected outputs generated.

Basic 40/40|Specialized 57/60|Total 97/100

✅A1Output contains all documented files including score table, heatmap PDF, session_info.txt, and manifests

✅A2Structured SKILL_* error codes emitted on failure with non-zero exit status

✅A3set.seed() called before any stochastic operations

✅A4Scope is limited to ESTIMATE-based TME scoring; no clinical conclusions made

✅A5No hardcoded credentials or PHI exposure in scripts or SKILL.md

Pass rate: 5 / 5

Variant A✅ Pass

Grouped comparison with boxplot and significance table (Tumor vs Healthy)

Group file triggers conditional loading of ggplot2/ggpubr/tidyr/dplyr; boxplot PDF and stats CSV generated.

Basic 39/40|Specialized 55/60|Total 94/100

✅A1Boxplot and stats CSV generated when valid 2-group file is provided

✅A2SKILL_INVALID_PARAMETER raised if fewer than 3 samples per group

✅A3Stats CSV contains per-score p-values and median-direction annotations

✅A4Core ESTIMATE outputs preserved even if grouped step fails after core scoring

✅A5No fabricated statistical values in output

Pass rate: 5 / 5

Edge✅ Pass

Group file with 2 samples total (1 per group) — below minimum threshold

Core ESTIMATE scoring completes; group comparison correctly rejected with SKILL_INVALID_PARAMETER (fewer than 3 samples per group). Correct defensive behavior per Scene Override.

Basic 39/40|Specialized 56/60|Total 95/100

✅A1SKILL_INVALID_PARAMETER raised when a group contains fewer than 3 samples

✅A2Core ESTIMATE score table and heatmap preserved before grouped step fires

✅A3Failure details appended to output_manifest.txt and run_record.txt

✅A4Error message clearly identifies which condition failed

Pass rate: 4 / 4

Variant B✅ Pass

TSV input with Entrez ID gene identifiers and custom output directory

TSV delimiter and EntrezID type are documented CLI options; all standard outputs generated in custom directory.

Basic 40/40|Specialized 56/60|Total 96/100

✅A1TSV delimiter accepted and matrix parsed correctly via --input_delimiter tsv

✅A2EntrezID gene type correctly passed to ESTIMATE pipeline via --gene_id_type EntrezID

✅A3All standard output files generated in specified custom output directory

✅A4Scope not exceeded; no analysis beyond ESTIMATE score computation attempted

Pass rate: 4 / 4

Stress✅ Pass

Large matrix with 3-group file and 300-second timeout

Core ESTIMATE scoring completes for large matrix; 3-group file correctly rejected with SKILL_INVALID_PARAMETER ('Exactly two group levels are supported'). Timeout parameter accepted.

Basic 39/40|Specialized 55/60|Total 94/100

✅A1SKILL_INVALID_PARAMETER raised when more than 2 group levels are present in group file

✅A2Core ESTIMATE scoring completes before group level validation fires

✅A3Failure state recorded in output_manifest.txt and run_record.txt

✅A4Timeout parameter accepted and enforced via enable_timeout()

Pass rate: 4 / 4

Medical Task Total95.2 / 100

Key Strengths

Exceptional modular R architecture with 7 clearly scoped modules enabling clean maintenance and extension.
Comprehensive structured error handling with SKILL_* codes, append-only manifests, and partial output preservation on grouped comparison failure.
Correct application of Scene Override rules: hard stops on invalid group sizes (< 3 samples) prevent invalid statistical testing rather than attempting to continue.
Progressive disclosure well-implemented: SKILL.md is concise with all algorithm detail deferred to references/.
Full test infrastructure with bundled public demo data, automated smoke tests (run_tests.R + test_skill.R), and documented expected outputs.