MedSkillAudit: How AIPOCH Evaluates Medical Research Agent Skills?
Explore MedSkillAudit, a domain-specific evaluation framework for medical research agent skills.
What is MedSkillAudit?
MedSkillAudit is a domain-specific evaluation framework for medical research agent skills. AIPOCH developed MedSkillAudit, a layered framework assessing skill release readiness before deployment.
How does Medical Skill Auditor Work?
Veto Gates
To enforce strict quality control, MedSkillAudit is designed with two layers of veto mechanisms. Any failure in these checks may lead to immediate rejection of a skill.
Skill Veto
Take the agent skill “medical-research-literature-reader-pro” as an example:

- Operational Stability
- Structural Consistency
- Result Determinism
- System Security
Research Veto
Take the agent skill “medical-research-literature-reader-pro” as an example:

- Scientific Integrity
- Practice Boundaries
- Methodological Ground
- Code Usability
Core Capability
Take the agent skill “medical-research-literature-reader-pro” as an example:

Evaluates a skill’s design and contract against key dimensions such as Functional Suitability, Reliability, Performance & Context, Agent Usability, Human Usability, Security, Agent-Specific and Maintainability.
Medical Task
Take the agent skill “medical-research-literature-reader-pro” as an example:

Assesses actual outputs of a skill with layered criteria.
For skill testing, the AI automatically generates inputs. The number of inputs in specific categories will increase or decrease depending on the complexity of the skill. The following 7 inputs represent the most comprehensive version.
- Canonical
- Variant A
- Edge
- Variant B
- Stress
- Scope Boundary
- Adversarial
Skill Complexity Classification
| Label | Code/Rank | Definition |
|---|---|---|
| Simple | S | Narrow task scope |
| Moderate | M | Moderate branching or multiple task types |
| Complex | C | Broad or multi-step specialized skill |
Simple (S): 3 inputs
Moderate (M): 5 inputs
Complex (C): 7 inputs
Final Score
Take the agent skill “medical-research-literature-reader-pro” as an example:

Skills passing both veto gates received a final quality score. The MedSkillAudit uses a two-stage scoring system: static evaluation (design quality, accounting for 40%) and dynamic evaluation (runtime performance, accounting for 60%). The final overall score is derived by combining both.
- Static (40%)
- Dynamic (60%)
Final Score = Static Score × 40% + Dynamic Score × 60%
You can view evaluation results for selected AIPOCH skills here.
Explore more AIPOCH Agent Skills
You can explore a growing collection of Medical Research Agent Skills on
If you find it useful, consider giving it a ⭐ to support the project!