Data Analysis

estimate-immune-score-analysis

Use this skill to compute ESTIMATE immune-related microenvironment scores from a bulk expression matrix, generate an ESTIMATE score heatmap, and optionally generate group-wise ESTIMATE score boxplots plus significance tables when a sample group file is supplied.

97100Total Score
Core Capability
99 / 100
Functional Suitability
12 / 12
Reliability
12 / 12
Performance & Context
8 / 8
Agent Usability
16 / 16
Human Usability
8 / 8
Security
11 / 12
Maintainability
12 / 12
Agent-Specific
20 / 20
Medical Task
22 / 22 Passed
97Basic ESTIMATE scoring with heatmap on GeneSymbol CSV matrix
5/5
94Grouped comparison with boxplot and significance table (Tumor vs Healthy)
5/5
95Group file with 2 samples total (1 per group) — below minimum threshold
4/4
96TSV input with Entrez ID gene identifiers and custom output directory
4/4
94Large matrix with 3-group file and 300-second timeout
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASS
No fabricated DOI, PMID, p-values, or clinical data across all 5 inputs; all values derived from actual ESTIMATE pipeline computation.
Practice BoundariesPASS
Explicit disclaimer that skill is not for clinical diagnosis or treatment decisions; When Not to Use section enforces this boundary.
Methodological GroundPASS
ESTIMATE algorithm correctly applied for TME scoring; no methodological fallacies detected; hard stops for statistically invalid configurations (< 3 samples per group) are correct defensive design.
Code UsabilityPASS
All R scripts syntactically valid; dependencies (estimate, pheatmap, ggplot2, ggpubr) are standard CRAN/Bioconductor packages; modular sourcing pattern is correct.

Core Capability99 / 1008 Categories

Functional Suitability
All promised use cases covered: ESTIMATE scoring, heatmap, grouped boxplot, significance table, TSV/CSV/EntrezID support.
12 / 12
100%
Reliability
Hard stops on invalid input; error captured at all levels; partial outputs preserved on grouped comparison failure; Scene Override criteria correctly applied.
12 / 12
100%
Performance & Context
Progressive disclosure via when-to-read table; conditional package loading deferred until group file is confirmed; no context bloat.
8 / 8
100%
Agent Usability
Clear execution model, step-by-step workflow, explicit when-to-read table, output manifest + run record, and preventive guidance for duplicate sample names.
16 / 16
100%
Human Usability
Trigger keywords listed in description; natural request patterns provided; strict validation correct per Scene Override; error messages clearly identify issues.
8 / 8
100%
Security
No hardcoded secrets; file path validation present; missing explicit privacy advisory for user expression data (unlike GSVA sibling skill).
11 / 12
92%
Maintainability
Clean 7-module R architecture; run_tests.R and test_skill.R provide full test coverage; expected outputs documented.
12 / 12
100%
Agent-Specific
Trigger precision excellent with NOT-for exclusions; progressive disclosure well-layered; SKILL_* codes parseable for agent retry; append-only manifests ensure idempotency; clear escape hatches.
20 / 20
100%
Core Capability Total99 / 100

Medical TaskExecution Average: 95.2 / 100 — Assertions: 22/22 Passed

97
Canonical
Basic ESTIMATE scoring with heatmap on GeneSymbol CSV matrix
5/5
94
Variant A
Grouped comparison with boxplot and significance table (Tumor vs Healthy)
5/5
95
Edge
Group file with 2 samples total (1 per group) — below minimum threshold
4/4
96
Variant B
TSV input with Entrez ID gene identifiers and custom output directory
4/4
94
Stress
Large matrix with 3-group file and 300-second timeout
4/4
97
Canonical✅ Pass
Basic ESTIMATE scoring with heatmap on GeneSymbol CSV matrix

Full pipeline: validate -> filterCommonGenes -> estimateScore -> heatmap + score table. All expected outputs generated.

Basic 40/40|Specialized 57/60|Total 97/100
A1Output contains all documented files including score table, heatmap PDF, session_info.txt, and manifests
A2Structured SKILL_* error codes emitted on failure with non-zero exit status
A3set.seed() called before any stochastic operations
A4Scope is limited to ESTIMATE-based TME scoring; no clinical conclusions made
A5No hardcoded credentials or PHI exposure in scripts or SKILL.md
Pass rate: 5 / 5
94
Variant A✅ Pass
Grouped comparison with boxplot and significance table (Tumor vs Healthy)

Group file triggers conditional loading of ggplot2/ggpubr/tidyr/dplyr; boxplot PDF and stats CSV generated.

Basic 39/40|Specialized 55/60|Total 94/100
A1Boxplot and stats CSV generated when valid 2-group file is provided
A2SKILL_INVALID_PARAMETER raised if fewer than 3 samples per group
A3Stats CSV contains per-score p-values and median-direction annotations
A4Core ESTIMATE outputs preserved even if grouped step fails after core scoring
A5No fabricated statistical values in output
Pass rate: 5 / 5
95
Edge✅ Pass
Group file with 2 samples total (1 per group) — below minimum threshold

Core ESTIMATE scoring completes; group comparison correctly rejected with SKILL_INVALID_PARAMETER (fewer than 3 samples per group). Correct defensive behavior per Scene Override.

Basic 39/40|Specialized 56/60|Total 95/100
A1SKILL_INVALID_PARAMETER raised when a group contains fewer than 3 samples
A2Core ESTIMATE score table and heatmap preserved before grouped step fires
A3Failure details appended to output_manifest.txt and run_record.txt
A4Error message clearly identifies which condition failed
Pass rate: 4 / 4
96
Variant B✅ Pass
TSV input with Entrez ID gene identifiers and custom output directory

TSV delimiter and EntrezID type are documented CLI options; all standard outputs generated in custom directory.

Basic 40/40|Specialized 56/60|Total 96/100
A1TSV delimiter accepted and matrix parsed correctly via --input_delimiter tsv
A2EntrezID gene type correctly passed to ESTIMATE pipeline via --gene_id_type EntrezID
A3All standard output files generated in specified custom output directory
A4Scope not exceeded; no analysis beyond ESTIMATE score computation attempted
Pass rate: 4 / 4
94
Stress✅ Pass
Large matrix with 3-group file and 300-second timeout

Core ESTIMATE scoring completes for large matrix; 3-group file correctly rejected with SKILL_INVALID_PARAMETER ('Exactly two group levels are supported'). Timeout parameter accepted.

Basic 39/40|Specialized 55/60|Total 94/100
A1SKILL_INVALID_PARAMETER raised when more than 2 group levels are present in group file
A2Core ESTIMATE scoring completes before group level validation fires
A3Failure state recorded in output_manifest.txt and run_record.txt
A4Timeout parameter accepted and enforced via enable_timeout()
Pass rate: 4 / 4
Medical Task Total95.2 / 100

Key Strengths

  • Exceptional modular R architecture with 7 clearly scoped modules enabling clean maintenance and extension.
  • Comprehensive structured error handling with SKILL_* codes, append-only manifests, and partial output preservation on grouped comparison failure.
  • Correct application of Scene Override rules: hard stops on invalid group sizes (< 3 samples) prevent invalid statistical testing rather than attempting to continue.
  • Progressive disclosure well-implemented: SKILL.md is concise with all algorithm detail deferred to references/.
  • Full test infrastructure with bundled public demo data, automated smoke tests (run_tests.R + test_skill.R), and documented expected outputs.