Back to Blog
5 min read

arXiv preflight: How Can AI Agents Check a Manuscript Before arXiv Submission?

arXiv now bans authors for unreviewed LLM output. The arXiv Preflight agent skill from AIPOCH helps researchers scan for LLM artifacts, hallucinated references, and placeholder data before submission — and outputs a graded fix list. Free on GitHub.

AIPOCHMay 27, 2026

arXiv preflight

Introduction

On May 14, 2026, Thomas G. Dietterich — chair of arXiv's computer science section — posted a thread on X (@tdietterich) that spread quickly among researchers: every listed author bears full responsibility for all paper contents, regardless of how those contents were generated. Submissions containing clear evidence of unreviewed LLM output now carry a one-year ban from the platform; after that, any new submission must first clear peer review before normal posting resumes.

This is not a ban on AI tools — it is a named penalty attached to a responsibility gap that already existed. For researchers who use LLMs at any stage of writing, it creates a concrete preflight obligation before every arXiv submission: confirm that no artifacts survived into the final draft, that every citation resolves to a real paper, and that AI-use disclosures are present where required.

The arXiv Preflight agent skill from AIPOCH is designed to assist with exactly this workflow step — providing a structured, reproducible check that can help surface submission risks before a manuscript reaches the upload queue. The skill is available on GitHub and the AIPOCH skill page.


What This Agent Skill Does

The arXiv Preflight skill helps researchers run a structured submission-readiness check on a manuscript before arXiv upload. The goal is not to judge whether a paper was AI-written from style. The goal is to find concrete, locatable, reviewable evidence of unchecked LLM output, hallucinated references, placeholder data, and arXiv-policy risks — and to hand the author a fix list.

The skill is not a peer reviewer, a writing style evaluator, or an automated acceptance predictor. Its scope is specifically limited to:

  • Detecting hard-coded AI artifact patterns in manuscript text
  • Verifying citation metadata against external reference databases (Crossref, arXiv, OpenAlex, Semantic Scholar)
  • Extracting numbers, figures, tables, and citation keys for consistency review
  • Checking arXiv policy compliance signals (author listings, disclosure language)
  • Generating a structured Markdown preflight report with graded risk findings

Every finding in the output is tied to a file location, a line number, and a verbatim excerpt. The skill's design principle is that a finding without a location is not a finding — it is a suggestion, and the two carry different weights in a preflight context.

Outputs are structured at three decision levels: PASS, PASS_WITH_FIXES, and HOLD. Any BLOCKER-level finding results in a HOLD recommendation; the report then hands the author a fix list with specific locations.

👉 GitHub — arxiv-preflight

👉 AIPOCH Skill Page — arxiv-preflight


Workflow Demo

The video below shows the full preflight workflow running inside OpenClaw — from manuscript upload to final report output.

arXiv Preflight Workflow Demo


Workflow Execution Example

Step 1 — Input

The researcher uploads main.pdf directly to OpenClaw with a plain-language prompt: "Check this LaTeX manuscript before I submit to arXiv. I think I left some assistant chatter in there." The skill accepts LaTeX project directories, standalone PDFs, or .bib files. LaTeX source is preferred — as noted in the skill's documentation on GitHub — because PDF text extraction loses document structure and produces noisier reference checks. When only a PDF is available, any lookups that fail due to extraction quality are flagged as INCOMPLETE rather than silently passed.

researcher uploads main.pdf to OpenClaw

OpenClaw receives the manuscript and confirms the arxiv-preflight skill is loaded. Extraction runs immediately: 4 sections, 1 citation, 1,092 characters scanned.


Step 2 — AI Workflow Execution

Once input is received, the skill works through four areas in sequence. It first extracts and merges the full manuscript text — resolving \input/\include chains and flagging anything it could not reach. It then scans the extracted text against a pattern library for LLM meta-comments, prompt residue, placeholder content, and AI-listed-as-author entries; any match is treated as a BLOCKER by default. In parallel, it looks up every citation in external databases (Crossref, arXiv, OpenAlex, Semantic Scholar) and grades mismatches by severity — a full title/author/year mismatch is a BLOCKER; a single-field disagreement is MEDIUM. If network access fails, the reference section is marked INCOMPLETE rather than silently passed.

Finally, the skill surfaces extracted numbers, figure labels, and citation keys as candidate locations for the author to review, and runs a limited heuristic against the acknowledgments section for AI-use disclosure signals. Neither the consistency check nor the disclosure check produces automatic conclusions — they provide structured locations for the researcher to examine.


Step 3 — Structured Outputs

Output: preflight_report.md

The full execution trace: artifact scan returns 10 BLOCKER findings across 26 regex rules (LLM meta-comments and XX% placeholders detected); reference check verifies 1 entry via OpenAlex with 0 issues; final decision is HOLD. The preflight_report.md is delivered directly in the chat.

The report is organized by risk level:

Risk LevelDescriptionExample
BLOCKERShould stop submissionLLM meta-comment in body text; reference unmatched in all four databases
HIGHLikely integrity or policy issueDOI/title mismatch; strong claim with no supporting number in text
MEDIUMInconsistency or missing metadataBroken\ref; unclear disclosure; single-field citation mismatch
LOWPolish and formattingCapitalization; whitespace; minor formatting

Each finding entry includes a risk classification, file name and line number, a verbatim excerpt (≤ 25 words), a plain-language description, and a suggested action for the author. The overall decision (PASS, PASS_WITH_FIXES, or HOLD) is a workflow signal — rewriting and final submission decisions remain entirely with the author.


Example Research Use Cases

1. Pre-submission manuscript QC for a computational biology paperA researcher who used an LLM to assist with drafting a methods section runs the preflight workflow to confirm that no meta-comments, prompt fragments, or unreferenced claims survived into the submitted version.

2. Reference integrity check after literature synthesisA systematic review team that compiled 80+ references using multiple tools runs the reference verification stage to flag any citations that cannot be matched in external databases before submission.

3. Multi-author manuscript auditA corresponding author receives a near-final draft from collaborators and runs the full preflight to identify placeholder text or policy risks that may have been introduced in sections they did not personally write.

4. AI-disclosure calibration for a methods-heavy paperA team that used generative AI tools for code generation and figure preparation runs the disclosure heuristic to surface candidate AI-assistance signals in the manuscript. The skill flags relevant locations; the authors then review whether their acknowledgments section reflects significant use adequately.

5. LaTeX project consistency reviewA researcher with a large multi-file LaTeX project runs the extraction and label-check stages to surface broken \ref calls, missing figure labels, and cite-key mismatches before final compilation.


Manual Workflow vs AI Agent Workflow

TaskManual WorkflowAI Agent Workflow
Scan full text for LLM artifactsResearcher reads entire manuscript; relies on memory for known patternsPattern library scanned against extracted text; every hit flagged with file + line location
Verify referencesResearcher checks each citation individually in browser or databaseAutomated lookup across Crossref, arXiv, OpenAlex, Semantic Scholar; structured match/mismatch output
Identify placeholder contentCtrl+F for "TODO", "TBD" — easy to miss in nested.texincludesMerged multi-file extraction scanned against placeholder pattern list; missed includes flagged as risks
Check AI authorship complianceManual re-read of author list and acknowledgmentsArtifact scanner flags AI-listed-as-author patterns; disclosure heuristic surfaces candidate locations — final judgment remains with the author
Produce a reviewable fix listResearcher takes notes during review; format variesStructured Markdown report with risk grades, locations, excerpts, and overall decision
Handle multi-file LaTeX projectsResearcher must manually trace\input/\includechainsextract_manuscript_text.pymerges includes automatically; unresolved paths flagged

Who Can Benefit From This Skill

  • Researchers preparing arXiv preprints​, particularly those who used AI tools at any stage of writing or editing
  • Graduate students submitting their first preprints​, who may be less familiar with arXiv's moderation policies or common submission pitfalls
  • Corresponding authors responsible for coordinating multi-author manuscripts, where sections were drafted independently
  • Computational biology and bioinformatics teams working with large, multi-file LaTeX projects and extensive reference lists
  • Systematic review and meta-analysis teams managing high reference volumes across multiple literature sources

Conclusion

The arXiv Preflight skill addresses a specific, repetitive, and high-stakes step in the academic submission workflow: confirming that a manuscript is free of LLM artifacts, hallucinated references, placeholder data, and policy conflicts before it reaches arXiv's moderation queue. By running a structured check and generating a graded Markdown report with precise finding locations, the skill can help researchers approach submission with greater consistency and less manual review burden.

The outputs are designed to support researcher judgment, not replace it. Every finding points to a specific location and excerpt. The overall decision — PASS, PASS_WITH_FIXES, or HOLD — is a workflow signal, not an editorial verdict. Rewriting and final submission decisions remain entirely with the author.


Explore More AIPOCH Agent Skills

AIPOCH is a collection of Medical Research Agent Skills created to support AI-assisted biomedical research workflows across literature review, evidence organization, bioinformatics preprocessing, data analysis support, and research writing tasks.

Download the arXiv Preflight skill:

👉 GitHub — arxiv-preflight

👉 AIPOCH Skill Page — arxiv-preflight

To explore other research workflow skills, visit the AIPOCH skill library.

Disclaimer This blog is for informational purposes only. The arXiv Preflight skill is a research workflow assistance tool — not a peer review system, editorial decision tool, or guarantee of submission acceptance. All outputs are workflow signals for researcher review; final submission decisions remain the sole responsibility of the authors. Policy details referenced in this article are based on publicly available reporting at the time of writing. Readers should consult arXiv's official policies for current guidance.