Data Analysis

sample-group-sankey-plot

Generates Sankey or alluvial plots from sample annotation tables where rows are samples and selected columns are categorical stages such as risk group, response status, subtype, or cohort labels. Exports annotations, lodes-format table, plot PDF, and session metadata.

96100Total Score

Core Capability

95 / 100

Functional Suitability

11 / 12

Reliability

12 / 12

Performance & Context

7 / 8

Agent Usability

15 / 16

Human Usability

8 / 8

Security

11 / 12

Maintainability

11 / 12

Agent-Specific

20 / 20

Medical Task

25 / 25 Passed

98Two-stage Sankey on risk and Responder columns

5/5

98Three-stage Sankey with custom title

5/5

96All columns auto-included when --columns is omitted (7 columns)

5/5

99Custom output prefix, alpha, and label_size parameters

5/5

95Six-stage Sankey with 10+ unique values per stage

5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No fabricated values; skill only visualizes existing categorical annotations; no statistical inference or data invention performed across all 5 inputs
Practice Boundaries	PASS	Skill produces visualization outputs only; no diagnostic claims or medical conclusions; no disclaimer required for a visualization tool
Methodological Ground	PASS	ggalluvial alluvial transformation is the correct approach for categorical flow visualization; no methodological fallacies detected; real citations (Brunson 2020) included in SKILL.md; readability advisories emitted for over-parameterized inputs
Code Usability	PASS	scripts/main.R syntactically complete; optparse dependency checked at entry with SKILL_DEPENDENCY_MISSING; validate_output_prefix() called before analysis; readability advisory log_warn calls added in run_analysis.R; no infinite loops

Core Capability95 / 100 — 8 Categories

Functional Suitability

All use cases covered including Agent Response Contract and readability advisories; minor deduction for session_info overhead in a simple visualization tool

11 / 12

92%

Reliability

SKILL_* error codes for all failure modes; Scene Override applied: hard stops at failure are correct defensive design for scientific computing; fully structured error interface

12 / 12

100%

Performance & Context

Reference files now bundled in references/; SKILL.md at 256 lines within range; minor: cli-guide.md rarely needed but loaded in When-to-Read table

7 / 8

88%

Agent Usability

Agent Response Contract added defining exact output format; readability advisories documented in workflow; all step patterns consistent; minor gap: no advisory about NA count during normalization

15 / 16

94%

Human Usability

Natural trigger language with NOT-for exclusions; real DOI citations; Scene Override applied: strict input validation correct for scientific computing; error messages clearly identify issues

8 / 8

100%

Security

No credentials; validate_output_prefix() prevents path traversal; no eval/exec; minor: input file path written to session_info.txt (local path disclosure only, acceptable)

11 / 12

92%

Maintainability

Five scripts with clean separation; three reference files now bundled; clear extension points; minor: test scripts referenced but not in visible bundle

11 / 12

92%

Agent-Specific

Trigger precisely targets categorical annotation visualization; progressive disclosure complete with bundled references; Agent Response Contract defines composability interface; set.seed wired for idempotency; escape hatches with refusal template and SKILL_* codes

20 / 20

100%

Core Capability Total95 / 100

Medical TaskExecution Average: 97.2 / 100 — Assertions: 25/25 Passed

Canonical

Two-stage Sankey on risk and Responder columns

5/5 ✓

Variant A

Three-stage Sankey with custom title

5/5 ✓

Edge

All columns auto-included when --columns is omitted (7 columns)

5/5 ✓

Variant B

Custom output prefix, alpha, and label_size parameters

5/5 ✓

Stress

Six-stage Sankey with 10+ unique values per stage

5/5 ✓

Canonical✅ Pass

Two-stage Sankey on risk and Responder columns

All 4 output files present; lodes format correct; sankey_plot.pdf generated; no readability advisories for 2-stage run; Agent Response Contract output produced

Basic 39/40|Specialized 59/60|Total 98/100

✅A1Output contains Agent Response Contract summary with all required fields

✅A2set.seed(42) is called before analysis in main.R

✅A3All 4 output files are produced (annotations, lodes, PDF, session_info)

✅A4No readability advisory emitted for 2-stage run

✅A5No credentials or sensitive data in any output file

Pass rate: 5 / 5

Variant A✅ Pass

Three-stage Sankey with custom title

Three-stage plot generated correctly; title applied via nzchar() check; no readability advisories for 3 stages; all output files consistent

Basic 39/40|Specialized 59/60|Total 98/100

✅A1Three x-levels in lodes table matching three stage columns

✅A2Title applied via ggtitle() conditional on nzchar()

✅A3selected_annotations.csv contains exactly the three specified columns

✅A4No stage-count advisory for 3 stages

✅A5Agent Response Contract summary includes readability warnings field showing none

Pass rate: 5 / 5

Edge✅ Pass

All columns auto-included when --columns is omitted (7 columns)

All 7 columns used automatically; readability advisory correctly emitted for >5 stages; advisory surfaces in Agent Response Contract; plot generated

Basic 39/40|Specialized 57/60|Total 96/100

✅A1parse_selected_columns returns all available columns when --columns omitted

✅A2Readability advisory emitted for >5 stages (7 stages detected)

✅A3SKILL_EMPTY_DATA would be emitted if fewer than 2 columns available

✅A4Agent Response Contract includes readability warning in output summary

✅A5Plot is produced (advisory is non-blocking)

Pass rate: 5 / 5

Variant B✅ Pass

Custom output prefix, alpha, and label_size parameters

Custom prefix my_cohort produces my_cohort.pdf; alpha and label_size validated and applied; security validation confirmed; perfect basic score

Basic 40/40|Specialized 59/60|Total 99/100

✅A1--output_prefix validated for safe characters (alphanumeric, dot, underscore, hyphen only)

✅A2Custom prefix my_cohort produces my_cohort.pdf in plot/ directory

✅A3--alpha 0.3 correctly accepted as valid numeric in [0,1]

✅A4--label_size 5 correctly accepted as positive numeric

✅A5SKILL_INVALID_PARAMETER emitted for output_prefix containing invalid special characters

Pass rate: 5 / 5

Stress✅ Pass

Six-stage Sankey with 10+ unique values per stage

Both readability advisories triggered correctly (>5 stages, >8 unique values); plot generated; advisory text surfaces in Agent Response Contract

Basic 39/40|Specialized 56/60|Total 95/100

✅A1Stage-count advisory emitted when >5 stages selected (6 stages)

✅A2Unique-value advisory emitted for stage with 10 unique values

✅A3Plot generated successfully despite readability boundary inputs (advisories non-blocking)

✅A4Agent Response Contract readability warnings field populated with both advisory messages

✅A5session_info.txt written correctly despite high-complexity input

Pass rate: 5 / 5

Medical Task Total97.2 / 100

Key Strengths

Agent Response Contract added — callers now receive a structured summary with stage count, sample count, output paths, and readability warnings in a single parseable block
Runtime readability advisories now enforced in run_analysis.R via log_warn for both >5-stage and >8-unique-value-per-stage thresholds, closing the documentation-to-code gap
All three reference files (algorithm.md, troubleshooting.md, cli-guide.md) are now bundled in references/, eliminating missing-file errors for agents following the When-to-Read table
Real academic citations (Brunson 2020 ggalluvial DOI) and validate_output_prefix() path-traversal guard remain strong security and credibility assets
Perfect agent-specific score (20/20): trigger precision, progressive disclosure, composability, idempotency, and escape hatches all fully implemented