Data Analysis

gsva-analysis-and-visualization

Use this skill to run GSVA or ssGSEA pathway-level differential analysis from a bulk expression matrix and a sample group file, then generate a heatmap from the saved GSVA result object.

95100Total Score
Core Capability
100 / 100
Functional Suitability
12 / 12
Reliability
12 / 12
Performance & Context
8 / 8
Agent Usability
16 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
20 / 20
Medical Task
28 / 29 Passed
97Full GSVA pipeline: bulk matrix vs Tumor/Healthy with KEGG gene sets
5/5
94ssGSEA with MSigDB Hallmarks (H category), top 30 pathways
4/4
90GSVA with 3 samples per group — below recommended minimum
3/4
96Visualization reuse from saved GSVA_list.rda with custom heatmap parameters
4/4
94Full GSVA with C2 REACTOME collection, FDR=0.01, top 50 pathways
4/4
83Single-cell RNA-seq pathway analysis request — out of scope
4/4
91Sample names in group file do not match expression matrix columns
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASS
No fabricated DOI, PMID, p-values, or pathway enrichment scores across all 7 inputs; all values derived from actual GSVA/limma computation on provided data.
Practice BoundariesPASS
Clinical diagnosis explicitly excluded in When Not to Use; privacy note for patient-linked data present; no medical recommendations made.
Methodological GroundPASS
GSVA + limma is the canonical bulk pathway enrichment pipeline; method selection guide correctly advises on GSVA vs ssGSEA for different sample sizes and data characteristics.
Code UsabilityPASS
All 9 R modules syntactically valid; dependencies (msigdbr, GSVA, limma, pheatmap) are standard Bioconductor/CRAN packages; three-mode branching logic is correct.

Core Capability100 / 1008 Categories

Functional Suitability
Full coverage of GSVA, ssGSEA, limma differential analysis, heatmap generation, and result object reuse across three execution modes.
12 / 12
100%
Reliability
Hard stops on missing files; visualize mode guards against missing rda; append-only manifests preserve provenance across multiple runs in the same output_dir.
12 / 12
100%
Performance & Context
Progressive disclosure via when-to-read table; references and algorithm details deferred; conditional mode execution avoids loading unnecessary analysis dependencies in visualize-only mode.
8 / 8
100%
Agent Usability
Mode selection guide provides genuine scientific guidance; execution model is explicit; method selection guide warns about sample size requirements; log messages at each step.
16 / 16
100%
Human Usability
Strong trigger keywords (GSVA, ssGSEA, KEGG, MSigDB); privacy note for patient-linked data is a best-practice addition; strict input validation is correct per Scene Override.
8 / 8
100%
Security
No hardcoded secrets; explicit privacy advisory for patient-linked matrices; file path validation; no prompt injection vectors.
12 / 12
100%
Maintainability
9-module R architecture with clean separation; run_tests.R validates both analyze mode and visualize reuse in one pass; test data sourced from public GEO series.
12 / 12
100%
Agent-Specific
Trigger precision excellent with specific NOT-for exclusions and named alternative skills; rda object enables clean downstream composition; append-only manifests support idempotent multi-run workflows.
20 / 20
100%
Core Capability Total100 / 100

Medical TaskExecution Average: 92.1 / 100 — Assertions: 28/29 Passed

97
Canonical
Full GSVA pipeline: bulk matrix vs Tumor/Healthy with KEGG gene sets
5/5
94
Variant A
ssGSEA with MSigDB Hallmarks (H category), top 30 pathways
4/4
90
Edge
GSVA with 3 samples per group — below recommended minimum
3/4
96
Variant B
Visualization reuse from saved GSVA_list.rda with custom heatmap parameters
4/4
94
Stress
Full GSVA with C2 REACTOME collection, FDR=0.01, top 50 pathways
4/4
83
Scope Boundary
Single-cell RNA-seq pathway analysis request — out of scope
4/4
91
Adversarial
Sample names in group file do not match expression matrix columns
4/4
97
Canonical✅ Pass
Full GSVA pipeline: bulk matrix vs Tumor/Healthy with KEGG gene sets

Full mode: load MSigDB KEGG -> GSVA scores -> limma diff analysis -> diff table + score matrices + rda + heatmap. All 8 expected outputs generated.

Basic 40/40|Specialized 57/60|Total 97/100
A1Output contains GSVA_diff.csv, GSVA_enrichment_results.csv, GSVA_list.rda, and heatmap PDF
A2limma logFC, P.Value, and adj.P.Val columns present in GSVA_diff.csv
A3set.seed() applied before GSVA score computation
A4No clinical diagnosis or treatment recommendation made
A5Privacy note for patient-linked data present in skill documentation
Pass rate: 5 / 5
94
Variant A✅ Pass
ssGSEA with MSigDB Hallmarks (H category), top 30 pathways

ssGSEA method with Hallmark gene sets; --top_n 30 applied to top pathway subset; GSVA_list.rda saved for downstream reuse.

Basic 39/40|Specialized 55/60|Total 94/100
A1ssGSEA method accepted and passed to GSVA computation function
A2MSigDB Hallmark category (H) correctly retrieved from msigdbr
A3top_n=30 applied to GSVA_enrichment_results_topN.csv pathway subset
A4GSVA_list.rda saved for downstream visualization reuse
Pass rate: 4 / 4
90
Edge✅ Pass
GSVA with 3 samples per group — below recommended minimum

Analysis executes (no hard programmatic block for sample size in GSVA mode); method selection guide warns about minimum 10 samples/group for gsva method but does not emit a runtime warning. Minor gap identified.

Basic 38/40|Specialized 52/60|Total 90/100
A1Analysis runs without hard block for insufficient sample count in GSVA mode
A2Method selection guide warns users about minimum sample sizes for gsva method
A3Runtime log_warn emitted when sample count falls below recommended minimum
A4Heatmap generated from top pathways when analysis completes
Pass rate: 3 / 4
96
Variant B✅ Pass
Visualization reuse from saved GSVA_list.rda with custom heatmap parameters

Visualize mode loads rda without re-running analysis; top_up=10, top_down=10, top_mode=both applied; output appended to existing manifests.

Basic 40/40|Specialized 56/60|Total 96/100
A1Visualize mode loads GSVA_list.rda without re-running analysis
A2top_up, top_down, top_mode parameters applied to heatmap pathway subset
A3SKILL_FILE_NOT_FOUND raised if GSVA_list.rda is missing from output_dir/data/
A4Output appended to existing output_manifest.txt and run_record.txt
Pass rate: 4 / 4
94
Stress✅ Pass
Full GSVA with C2 REACTOME collection, FDR=0.01, top 50 pathways

Large REACTOME collection loaded from msigdbr; FDR=0.01 and top_n=50 applied; timeout parameter available for long runs.

Basic 39/40|Specialized 55/60|Total 94/100
A1C2/REACTOME gene sets loaded correctly from msigdbr using --subcategory CP:REACTOME
A2FDR threshold 0.01 applied to top pathway selection
A3top_n=50 applied to exported pathway score matrix
A4Timeout parameter available to bound long computation on large gene set collections
Pass rate: 4 / 4
83
Scope Boundary✅ Pass
Single-cell RNA-seq pathway analysis request — out of scope

Out-of-scope response triggered; skill correctly declines and names the appropriate alternative workflow. No GSVA execution attempted.

Basic 35/40|Specialized 48/60|Total 83/100
A1Single-cell RNA-seq analysis explicitly identified as out of scope in When Not to Use
A2Out-of-scope response pattern provided with named alternative skill
A3GSVA workflow not executed for single-cell request
A4Named alternative skill (sc-clustering) referenced to help user proceed
Pass rate: 4 / 4
91
Adversarial✅ Pass
Sample names in group file do not match expression matrix columns

SKILL_SAMPLE_MISMATCH raised with actionable message identifying mismatched samples; no analysis proceeds with mismatched data.

Basic 38/40|Specialized 53/60|Total 91/100
A1SKILL_SAMPLE_MISMATCH raised when group file samples do not match matrix columns
A2Error message identifies the mismatch clearly for agent-driven retry
A3No partial GSVA analysis proceeds with mismatched sample data
A4SKILL_* error code format enables automated retry with corrected inputs
Pass rate: 4 / 4
Medical Task Total92.1 / 100

Key Strengths

  • Perfect static score: exemplary SKILL.md with complete documentation of all parameters, outputs, modes, and scientific method selection guidance.
  • Three-mode execution (analyze/visualize/full) with rda result object reuse is excellent workflow design enabling iterative visualization without expensive re-computation.
  • Explicit privacy note for patient-linked matrices is a best-practice addition that sets a standard for the skill collection.
  • Method selection guide (GSVA vs ssGSEA) provides genuine, actionable scientific guidance — not just parameter documentation.