knn-imputation
Filters genes with high missingness (>=50%) and imputes missing values in bulk expression matrices using group-aware KNN via DMwR2. Donor pool restricted by a single annotation column; strata with 10 or fewer samples fall back to row-wise mean/median filling.
Veto GatesRequired pass for any deployment consideration
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | No fabricated imputed values beyond standard DMwR2 KNN computation; all outputs derived from actual data within defined strata. |
| Practice Boundaries | PASS | No clinical diagnostic conclusions; tool is a data preprocessing utility for bulk expression matrices. |
| Methodological Ground | PASS | Group-stratified KNN imputation with 50% missingness filter is a valid and established preprocessing approach; fallback to row-wise mean/median for small strata is methodologically documented and correct. |
| Code Usability | PASS | main.R syntactically valid; dependency check for DMwR2 runs before analysis; timeout mechanism present; no infinite loops; on.exit cleanup handled correctly. |
Core Capability98 / 100 — 8 Categories
Medical TaskExecution Average: 92.7 / 100 — Assertions: 33/33 Passed
50% missingness filter applied, group-stratified KNN runs for strata with 11+ samples, imputed_expression_matrix.csv and session_info.txt produced.
Stratum with 10 or fewer samples triggers row-wise mean fill; documented behavior; no KNN attempted for small strata.
Gene with >=50% missingness within a stratum is skipped; values remain NA in that stratum; documented behavior correct for data integrity.
SKILL_OUTPUT_EXISTS raised when output files already exist and --overwrite not provided; correct protective behavior.
set_timeout_limit() active; SKILL_TIMEOUT raised if exceeded; timeout=0 disables; clean exit with partial cleanup if needed.
Input Validation guard fires; multi-column stratification is explicitly excluded in both SKILL.md and description.
Input Validation guard fires; single-cell data explicitly excluded in When to Use section.
Key Strengths
- SKILL_OUTPUT_EXISTS error code protects against accidental file overwrite — the --overwrite flag design is production-safe by default
- Nine structured SKILL_* error codes with the most comprehensive error table of all five audited skills
- DMwR2 not-on-CRAN warning is prominent in Prerequisites with exact GitHub install command — prevents the most common deployment failure
- Strata-level missingness skip (>=50% within stratum remains NA) is methodologically correct and explicitly documented in both Workflow and Methods
- Clean 2-file output (imputed matrix + session_info) maximizes composability for downstream analysis pipelines