estimate-immune-score-analysis
Use this skill to compute ESTIMATE immune-related microenvironment scores from a bulk expression matrix, generate an ESTIMATE score heatmap, and optionally generate group-wise ESTIMATE score boxplots plus significance tables when a sample group file is supplied.
Veto GatesRequired pass for any deployment consideration
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | No fabricated DOI, PMID, p-values, or clinical data across all 5 inputs; all values derived from actual ESTIMATE pipeline computation. |
| Practice Boundaries | PASS | Explicit disclaimer that skill is not for clinical diagnosis or treatment decisions; When Not to Use section enforces this boundary. |
| Methodological Ground | PASS | ESTIMATE algorithm correctly applied for TME scoring; no methodological fallacies detected; hard stops for statistically invalid configurations (< 3 samples per group) are correct defensive design. |
| Code Usability | PASS | All R scripts syntactically valid; dependencies (estimate, pheatmap, ggplot2, ggpubr) are standard CRAN/Bioconductor packages; modular sourcing pattern is correct. |
Core Capability99 / 100 — 8 Categories
Medical TaskExecution Average: 95.2 / 100 — Assertions: 22/22 Passed
Full pipeline: validate -> filterCommonGenes -> estimateScore -> heatmap + score table. All expected outputs generated.
Group file triggers conditional loading of ggplot2/ggpubr/tidyr/dplyr; boxplot PDF and stats CSV generated.
Core ESTIMATE scoring completes; group comparison correctly rejected with SKILL_INVALID_PARAMETER (fewer than 3 samples per group). Correct defensive behavior per Scene Override.
TSV delimiter and EntrezID type are documented CLI options; all standard outputs generated in custom directory.
Core ESTIMATE scoring completes for large matrix; 3-group file correctly rejected with SKILL_INVALID_PARAMETER ('Exactly two group levels are supported'). Timeout parameter accepted.
Key Strengths
- Exceptional modular R architecture with 7 clearly scoped modules enabling clean maintenance and extension.
- Comprehensive structured error handling with SKILL_* codes, append-only manifests, and partial output preservation on grouped comparison failure.
- Correct application of Scene Override rules: hard stops on invalid group sizes (< 3 samples) prevent invalid statistical testing rather than attempting to continue.
- Progressive disclosure well-implemented: SKILL.md is concise with all algorithm detail deferred to references/.
- Full test infrastructure with bundled public demo data, automated smoke tests (run_tests.R + test_skill.R), and documented expected outputs.