model-calibration-curve
Use when assessing how well a survival model's predicted probabilities agree with observed outcomes by fitting a Cox model and generating bootstrap calibration curves at one or more prediction horizons from a clinical CSV file. NOT for: nomogram construction, univariate Cox screening, ROC analysis, or decision-curve analysis.
Veto GatesRequired pass for any deployment consideration
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | No fabricated calibration statistics, C-index values, or survival data; all values computed from actual data via rms::calibrate() with bootstrap resampling. |
| Practice Boundaries | PASS | Explicitly not for clinical diagnosis; scoped to model validation only; named alternative skills provided for nomogram, ROC, and DCA tasks. |
| Methodological Ground | PASS | rms::calibrate() with bootstrap resampling is the standard approach for Cox model calibration assessment; bias-corrected calibration statistics are methodologically correct. |
| Code Usability | PASS | All 4 R modules syntactically valid; withCallingHandlers/tryCatch pattern correct; bootstrap_log_error fallback for pre-optparse error handling is a good defensive pattern. |
Core Capability96 / 100 — 8 Categories
Medical TaskExecution Average: 94.4 / 100 — Assertions: 18/18 Passed
Full pipeline: validate CSV -> complete-case filter -> Cox fit -> rms::calibrate() x3 horizons -> save .qs + .xlsx + PDF + session_info.txt.
Custom bootstrap count and horizon years accepted; 3 calibration curves generated with colors from default 5-color palette.
SKILL_INVALID_PARAMETER raised: requires >= 30 complete samples and >= 10 events. No partial model saved.
Custom plot dimensions, colors, and title applied; analysis logic unchanged by plot customization parameters.
8 predictors accepted; 5000 bootstrap reps applied; timeout enforced at 180s. All outputs generated within time limit.
Key Strengths
- rms::calibrate() with bootstrap resampling is the methodologically correct and standard approach for Cox model calibration; the implementation is faithful to the statistical method.
- Two-worksheet Excel output (Time_Point_Stats + Model_Summary) provides comprehensive model assessment in a single, well-organized file.
- NOT-for list with named alternative skills (nomogram-construction, roc-diagnostic-performance, decision-curve-analysis) is exemplary escape hatch design that helps users navigate the broader skill collection.
- bootstrap_log_error fallback for pre-optparse error handling is a good defensive pattern that prevents silent failures when the package check fires before argument parsing.