arXiv preflight: How Can AI Agents Check a Manuscript Before arXiv Submission?
arXiv now bans authors for unreviewed LLM output. The arXiv Preflight agent skill from AIPOCH helps researchers scan for LLM artifacts, hallucinated references, and placeholder data before submission — and outputs a graded fix list. Free on GitHub.

Introduction
On May 14, 2026, Thomas G. Dietterich — chair of arXiv's computer science section — posted a thread on X (@tdietterich) that spread quickly among researchers: every listed author bears full responsibility for all paper contents, regardless of how those contents were generated. Submissions containing clear evidence of unreviewed LLM output now carry a one-year ban from the platform; after that, any new submission must first clear peer review before normal posting resumes.
This is not a ban on AI tools — it is a named penalty attached to a responsibility gap that already existed. For researchers who use LLMs at any stage of writing, it creates a concrete preflight obligation before every arXiv submission: confirm that no artifacts survived into the final draft, that every citation resolves to a real paper, and that AI-use disclosures are present where required.
The arXiv Preflight agent skill from AIPOCH is designed to assist with exactly this workflow step — providing a structured, reproducible check that can help surface submission risks before a manuscript reaches the upload queue. The skill is available on GitHub and the AIPOCH skill page.
What This Agent Skill Does
The arXiv Preflight skill helps researchers run a structured submission-readiness check on a manuscript before arXiv upload. The goal is not to judge whether a paper was AI-written from style. The goal is to find concrete, locatable, reviewable evidence of unchecked LLM output, hallucinated references, placeholder data, and arXiv-policy risks — and to hand the author a fix list.
The skill is not a peer reviewer, a writing style evaluator, or an automated acceptance predictor. Its scope is specifically limited to:
- Detecting hard-coded AI artifact patterns in manuscript text
- Verifying citation metadata against external reference databases (Crossref, arXiv, OpenAlex, Semantic Scholar)
- Extracting numbers, figures, tables, and citation keys for consistency review
- Checking arXiv policy compliance signals (author listings, disclosure language)
- Generating a structured Markdown preflight report with graded risk findings
Every finding in the output is tied to a file location, a line number, and a verbatim excerpt. The skill's design principle is that a finding without a location is not a finding — it is a suggestion, and the two carry different weights in a preflight context.
Outputs are structured at three decision levels: PASS, PASS_WITH_FIXES, and HOLD. Any BLOCKER-level finding results in a HOLD recommendation; the report then hands the author a fix list with specific locations.
👉 AIPOCH Skill Page — arxiv-preflight
Workflow Demo
The video below shows the full preflight workflow running inside OpenClaw — from manuscript upload to final report output.
Workflow Execution Example
Step 1 — Input
The researcher uploads main.pdf directly to OpenClaw with a plain-language prompt: "Check this LaTeX manuscript before I submit to arXiv. I think I left some assistant chatter in there." The skill accepts LaTeX project directories, standalone PDFs, or .bib files. LaTeX source is preferred — as noted in the skill's documentation on GitHub — because PDF text extraction loses document structure and produces noisier reference checks. When only a PDF is available, any lookups that fail due to extraction quality are flagged as INCOMPLETE rather than silently passed.

OpenClaw receives the manuscript and confirms the arxiv-preflight skill is loaded. Extraction runs immediately: 4 sections, 1 citation, 1,092 characters scanned.
Step 2 — AI Workflow Execution
Once input is received, the skill works through four areas in sequence. It first extracts and merges the full manuscript text — resolving \input/\include chains and flagging anything it could not reach. It then scans the extracted text against a pattern library for LLM meta-comments, prompt residue, placeholder content, and AI-listed-as-author entries; any match is treated as a BLOCKER by default. In parallel, it looks up every citation in external databases (Crossref, arXiv, OpenAlex, Semantic Scholar) and grades mismatches by severity — a full title/author/year mismatch is a BLOCKER; a single-field disagreement is MEDIUM. If network access fails, the reference section is marked INCOMPLETE rather than silently passed.
Finally, the skill surfaces extracted numbers, figure labels, and citation keys as candidate locations for the author to review, and runs a limited heuristic against the acknowledgments section for AI-use disclosure signals. Neither the consistency check nor the disclosure check produces automatic conclusions — they provide structured locations for the researcher to examine.
Step 3 — Structured Outputs

The full execution trace: artifact scan returns 10 BLOCKER findings across 26 regex rules (LLM meta-comments and XX% placeholders detected); reference check verifies 1 entry via OpenAlex with 0 issues; final decision is HOLD. The preflight_report.md is delivered directly in the chat.
The report is organized by risk level:
| Risk Level | Description | Example |
|---|---|---|
BLOCKER | Should stop submission | LLM meta-comment in body text; reference unmatched in all four databases |
HIGH | Likely integrity or policy issue | DOI/title mismatch; strong claim with no supporting number in text |
MEDIUM | Inconsistency or missing metadata | Broken\ref; unclear disclosure; single-field citation mismatch |
LOW | Polish and formatting | Capitalization; whitespace; minor formatting |
Each finding entry includes a risk classification, file name and line number, a verbatim excerpt (≤ 25 words), a plain-language description, and a suggested action for the author. The overall decision (PASS, PASS_WITH_FIXES, or HOLD) is a workflow signal — rewriting and final submission decisions remain entirely with the author.
Example Research Use Cases
1. Pre-submission manuscript QC for a computational biology paperA researcher who used an LLM to assist with drafting a methods section runs the preflight workflow to confirm that no meta-comments, prompt fragments, or unreferenced claims survived into the submitted version.
2. Reference integrity check after literature synthesisA systematic review team that compiled 80+ references using multiple tools runs the reference verification stage to flag any citations that cannot be matched in external databases before submission.
3. Multi-author manuscript auditA corresponding author receives a near-final draft from collaborators and runs the full preflight to identify placeholder text or policy risks that may have been introduced in sections they did not personally write.
4. AI-disclosure calibration for a methods-heavy paperA team that used generative AI tools for code generation and figure preparation runs the disclosure heuristic to surface candidate AI-assistance signals in the manuscript. The skill flags relevant locations; the authors then review whether their acknowledgments section reflects significant use adequately.
5. LaTeX project consistency reviewA researcher with a large multi-file LaTeX project runs the extraction and label-check stages to surface broken \ref calls, missing figure labels, and cite-key mismatches before final compilation.
Manual Workflow vs AI Agent Workflow
| Task | Manual Workflow | AI Agent Workflow |
|---|---|---|
| Scan full text for LLM artifacts | Researcher reads entire manuscript; relies on memory for known patterns | Pattern library scanned against extracted text; every hit flagged with file + line location |
| Verify references | Researcher checks each citation individually in browser or database | Automated lookup across Crossref, arXiv, OpenAlex, Semantic Scholar; structured match/mismatch output |
| Identify placeholder content | Ctrl+F for "TODO", "TBD" — easy to miss in nested.texincludes | Merged multi-file extraction scanned against placeholder pattern list; missed includes flagged as risks |
| Check AI authorship compliance | Manual re-read of author list and acknowledgments | Artifact scanner flags AI-listed-as-author patterns; disclosure heuristic surfaces candidate locations — final judgment remains with the author |
| Produce a reviewable fix list | Researcher takes notes during review; format varies | Structured Markdown report with risk grades, locations, excerpts, and overall decision |
| Handle multi-file LaTeX projects | Researcher must manually trace\input/\includechains | extract_manuscript_text.pymerges includes automatically; unresolved paths flagged |
Who Can Benefit From This Skill
- Researchers preparing arXiv preprints, particularly those who used AI tools at any stage of writing or editing
- Graduate students submitting their first preprints, who may be less familiar with arXiv's moderation policies or common submission pitfalls
- Corresponding authors responsible for coordinating multi-author manuscripts, where sections were drafted independently
- Computational biology and bioinformatics teams working with large, multi-file LaTeX projects and extensive reference lists
- Systematic review and meta-analysis teams managing high reference volumes across multiple literature sources
Conclusion
The arXiv Preflight skill addresses a specific, repetitive, and high-stakes step in the academic submission workflow: confirming that a manuscript is free of LLM artifacts, hallucinated references, placeholder data, and policy conflicts before it reaches arXiv's moderation queue. By running a structured check and generating a graded Markdown report with precise finding locations, the skill can help researchers approach submission with greater consistency and less manual review burden.
The outputs are designed to support researcher judgment, not replace it. Every finding points to a specific location and excerpt. The overall decision — PASS, PASS_WITH_FIXES, or HOLD — is a workflow signal, not an editorial verdict. Rewriting and final submission decisions remain entirely with the author.
Explore More AIPOCH Agent Skills
AIPOCH is a collection of Medical Research Agent Skills created to support AI-assisted biomedical research workflows across literature review, evidence organization, bioinformatics preprocessing, data analysis support, and research writing tasks.
Download the arXiv Preflight skill:
👉 AIPOCH Skill Page — arxiv-preflight
To explore other research workflow skills, visit the AIPOCH skill library.
Disclaimer This blog is for informational purposes only. The arXiv Preflight skill is a research workflow assistance tool — not a peer review system, editorial decision tool, or guarantee of submission acceptance. All outputs are workflow signals for researcher review; final submission decisions remain the sole responsibility of the authors. Policy details referenced in this article are based on publicly available reporting at the time of writing. Readers should consult arXiv's official policies for current guidance.
