Data Analysis

sample-group-sankey-plot

Generates Sankey or alluvial plots from sample annotation tables where rows are samples and selected columns are categorical stages such as risk group, response status, subtype, or cohort labels. Exports annotations, lodes-format table, plot PDF, and session metadata.

96100Total Score
Core Capability
95 / 100
Functional Suitability
11 / 12
Reliability
12 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
11 / 12
Maintainability
11 / 12
Agent-Specific
20 / 20
Medical Task
25 / 25 Passed
98Two-stage Sankey on risk and Responder columns
5/5
98Three-stage Sankey with custom title
5/5
96All columns auto-included when --columns is omitted (7 columns)
5/5
99Custom output prefix, alpha, and label_size parameters
5/5
95Six-stage Sankey with 10+ unique values per stage
5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASS
No fabricated values; skill only visualizes existing categorical annotations; no statistical inference or data invention performed across all 5 inputs
Practice BoundariesPASS
Skill produces visualization outputs only; no diagnostic claims or medical conclusions; no disclaimer required for a visualization tool
Methodological GroundPASS
ggalluvial alluvial transformation is the correct approach for categorical flow visualization; no methodological fallacies detected; real citations (Brunson 2020) included in SKILL.md; readability advisories emitted for over-parameterized inputs
Code UsabilityPASS
scripts/main.R syntactically complete; optparse dependency checked at entry with SKILL_DEPENDENCY_MISSING; validate_output_prefix() called before analysis; readability advisory log_warn calls added in run_analysis.R; no infinite loops

Core Capability95 / 1008 Categories

Functional Suitability
All use cases covered including Agent Response Contract and readability advisories; minor deduction for session_info overhead in a simple visualization tool
11 / 12
92%
Reliability
SKILL_* error codes for all failure modes; Scene Override applied: hard stops at failure are correct defensive design for scientific computing; fully structured error interface
12 / 12
100%
Performance & Context
Reference files now bundled in references/; SKILL.md at 256 lines within range; minor: cli-guide.md rarely needed but loaded in When-to-Read table
7 / 8
88%
Agent Usability
Agent Response Contract added defining exact output format; readability advisories documented in workflow; all step patterns consistent; minor gap: no advisory about NA count during normalization
15 / 16
94%
Human Usability
Natural trigger language with NOT-for exclusions; real DOI citations; Scene Override applied: strict input validation correct for scientific computing; error messages clearly identify issues
8 / 8
100%
Security
No credentials; validate_output_prefix() prevents path traversal; no eval/exec; minor: input file path written to session_info.txt (local path disclosure only, acceptable)
11 / 12
92%
Maintainability
Five scripts with clean separation; three reference files now bundled; clear extension points; minor: test scripts referenced but not in visible bundle
11 / 12
92%
Agent-Specific
Trigger precisely targets categorical annotation visualization; progressive disclosure complete with bundled references; Agent Response Contract defines composability interface; set.seed wired for idempotency; escape hatches with refusal template and SKILL_* codes
20 / 20
100%
Core Capability Total95 / 100

Medical TaskExecution Average: 97.2 / 100 — Assertions: 25/25 Passed

98
Canonical
Two-stage Sankey on risk and Responder columns
5/5
98
Variant A
Three-stage Sankey with custom title
5/5
96
Edge
All columns auto-included when --columns is omitted (7 columns)
5/5
99
Variant B
Custom output prefix, alpha, and label_size parameters
5/5
95
Stress
Six-stage Sankey with 10+ unique values per stage
5/5
98
Canonical✅ Pass
Two-stage Sankey on risk and Responder columns

All 4 output files present; lodes format correct; sankey_plot.pdf generated; no readability advisories for 2-stage run; Agent Response Contract output produced

Basic 39/40|Specialized 59/60|Total 98/100
A1Output contains Agent Response Contract summary with all required fields
A2set.seed(42) is called before analysis in main.R
A3All 4 output files are produced (annotations, lodes, PDF, session_info)
A4No readability advisory emitted for 2-stage run
A5No credentials or sensitive data in any output file
Pass rate: 5 / 5
98
Variant A✅ Pass
Three-stage Sankey with custom title

Three-stage plot generated correctly; title applied via nzchar() check; no readability advisories for 3 stages; all output files consistent

Basic 39/40|Specialized 59/60|Total 98/100
A1Three x-levels in lodes table matching three stage columns
A2Title applied via ggtitle() conditional on nzchar()
A3selected_annotations.csv contains exactly the three specified columns
A4No stage-count advisory for 3 stages
A5Agent Response Contract summary includes readability warnings field showing none
Pass rate: 5 / 5
96
Edge✅ Pass
All columns auto-included when --columns is omitted (7 columns)

All 7 columns used automatically; readability advisory correctly emitted for >5 stages; advisory surfaces in Agent Response Contract; plot generated

Basic 39/40|Specialized 57/60|Total 96/100
A1parse_selected_columns returns all available columns when --columns omitted
A2Readability advisory emitted for >5 stages (7 stages detected)
A3SKILL_EMPTY_DATA would be emitted if fewer than 2 columns available
A4Agent Response Contract includes readability warning in output summary
A5Plot is produced (advisory is non-blocking)
Pass rate: 5 / 5
99
Variant B✅ Pass
Custom output prefix, alpha, and label_size parameters

Custom prefix my_cohort produces my_cohort.pdf; alpha and label_size validated and applied; security validation confirmed; perfect basic score

Basic 40/40|Specialized 59/60|Total 99/100
A1--output_prefix validated for safe characters (alphanumeric, dot, underscore, hyphen only)
A2Custom prefix my_cohort produces my_cohort.pdf in plot/ directory
A3--alpha 0.3 correctly accepted as valid numeric in [0,1]
A4--label_size 5 correctly accepted as positive numeric
A5SKILL_INVALID_PARAMETER emitted for output_prefix containing invalid special characters
Pass rate: 5 / 5
95
Stress✅ Pass
Six-stage Sankey with 10+ unique values per stage

Both readability advisories triggered correctly (>5 stages, >8 unique values); plot generated; advisory text surfaces in Agent Response Contract

Basic 39/40|Specialized 56/60|Total 95/100
A1Stage-count advisory emitted when >5 stages selected (6 stages)
A2Unique-value advisory emitted for stage with 10 unique values
A3Plot generated successfully despite readability boundary inputs (advisories non-blocking)
A4Agent Response Contract readability warnings field populated with both advisory messages
A5session_info.txt written correctly despite high-complexity input
Pass rate: 5 / 5
Medical Task Total97.2 / 100

Key Strengths

  • Agent Response Contract added — callers now receive a structured summary with stage count, sample count, output paths, and readability warnings in a single parseable block
  • Runtime readability advisories now enforced in run_analysis.R via log_warn for both >5-stage and >8-unique-value-per-stage thresholds, closing the documentation-to-code gap
  • All three reference files (algorithm.md, troubleshooting.md, cli-guide.md) are now bundled in references/, eliminating missing-file errors for agents following the When-to-Read table
  • Real academic citations (Brunson 2020 ggalluvial DOI) and validate_output_prefix() path-traversal guard remain strong security and credibility assets
  • Perfect agent-specific score (20/20): trigger precision, progressive disclosure, composability, idempotency, and escape hatches all fully implemented