Editorial oversight of AI-generated content: a practical checklist

What human review actually catches in AI drafts, with specific examples from a few months of editing Claude-drafted anime recaps. Not a manifesto — just the checks that survive contact with real drafts.

editorialAI contentwriting

Written by Hong-Bin Yoon · Founder, zzinDev LLC

Published April 18, 2026

“AI-assisted, human-edited” is a phrase you see on a lot of sites that are, in practice, AI-generated and lightly skimmed. The phrase has gotten cheap. If you publish content produced this way, you owe readers a concrete description of what the human actually does.

This post is mine. It’s based on four months of reviewing Claude-drafted season recaps for AnimeRecap — roughly one draft a day, sometimes more when I’m catching up on a franchise. It’s not a theory of editing; it’s the list of things I keep catching, in the order of how often they happen.

If you’re building a similar AI-assisted content operation and want to know what review actually has to cover, this is the honest answer.

Factual hallucinations of the “sounds right” kind

The most common failure mode isn’t a wild hallucination. It’s a sentence that reads correctly but is subtly wrong.

Claude is very good at writing plausible anime recaps. It’s seen enough of them in training to know the shape of one: “In the season’s penultimate episode, X confronts Y in the ruins of Z, and…” Drop a real show’s name and characters into that template and you get something that parses as true even if the specific scene never happened.

The fix is cross-referencing. My research pipeline pulls AniList and MyAnimeList metadata and includes it in the writer’s context. That catches a lot of the surface-level stuff — episode counts, studios, airing years. It doesn’t catch “X confronts Y in Z in the penultimate episode” because metadata doesn’t include plot beats.

So in review, for any claim specific enough to be wrong, I spot-check against the show’s wiki. Fan wikis are imperfect but dense with specifics; if the draft says someone dies in episode 19 and the wiki says episode 21, the draft is wrong and I fix it.

Most hallucinations I catch are this kind. Not dramatic, not obvious, just the kind of thing a fan would notice and get quietly angry about.

Spoiler tag discipline

Every recap has a spoiler warning at the top and spoiler tags around late-season reveals. The writer model gets this right most of the time — the structured prompt tells it to use <details> tags for major reveals, and it does.

What it gets wrong:

Spoiler content that appears outside the tag, usually because an earlier paragraph references it implicitly. Example: a draft that says “the season ends on a shocking revelation” later tags the revelation, but the phrase “the season ends on” is itself a mild spoiler about where in the season the reveal lives.
Tags placed around material that isn’t actually a spoiler, usually for episode 1-2 content that a TL;DR has already implied.
Tags missing entirely for the kind of reveal that’s been memed to death. The model thinks “everyone knows X dies” and skips the tag. That logic breaks for readers who picked this site to catch up because they don’t know, and it breaks even harder for readers browsing from search who landed on a recap they didn’t intend to read.

The fix is to be paranoid about it. If a reveal would spoil a friend of mine watching the show six months late, it gets a tag. Zero tolerance for “well-known spoilers” as an excuse.

Character name consistency

Anime titles have a naming mess. Romaji vs English vs subtitle translation. “Eren Yeager” vs “Eren Jaeger” vs “Yeager.” Scouts vs Scout Regiment vs Survey Corps. The model picks one and sticks with it within a draft — usually — but “usually” means about one draft in four has an inconsistency somewhere in the middle that nobody catches because the reader is deep enough in the post by then not to be double-checking.

I now run every draft through a mental find-replace pass: picking the canonical spelling (usually what the show’s subtitles use, with my judgment for ambiguous cases) and standardizing. It takes two minutes and catches a half-dozen inconsistencies most drafts.

The fix upstream is a per-show “canonical terms” dictionary in the research output that the writer’s prompt references. I haven’t built this yet. I should.

Confidence about unseen content

If I’m drafting a recap for a show whose final season just finished airing, the writer model may or may not have training data that covers the ending. It always writes confidently either way.

A draft will say “the series ends with X reconciling with Y, leaving the question of Z open-ended for a potential sequel” — and if you check, those events are from the manga, not the anime. Or they’re from a completely different show the model conflated. Or they’re invented whole.

The fix is airing-date arithmetic: if the show’s final episode aired within the model’s likely training window, I trust the draft conditionally. If it aired outside, I treat every concrete claim about the ending as suspect and manually verify. If a show is too new to verify confidently, I either delay the recap or explicitly note in the editorial commentary that the recap is based on the aired episodes only.

Prose quality

This is the most subjective category and the one where I override the model most aggressively.

Claude defaults to a register that’s slightly purple for my taste. “A breathtaking tour de force of animation.” “Jaw-dropping revelations.” “The emotional weight of the final confrontation lands with the force of a gut-punch.” These are fine in moderation. The model puts three per paragraph.

My heuristic is: if a sentence could be removed without the reader losing information, it probably should be. Recaps are for people who want to remember or understand a show; they aren’t hype posts. Cut the adjectives, keep the plot beats.

The other prose issue is narrative framing. Drafts sometimes write as if the reader is watching the show right now, using “we” and “us” (“as we meet the new antagonist”). I rewrite those to third-person past tense. Recaps are about shows that already aired; talking about them in the present progressive is a cosplay.

Ratings

Every recap has a numeric rating from 0 to 10. The pipeline seeds it with AniList’s community score at time of writing, and the writer is prompted to re-evaluate with editorial judgment.

In practice the model anchors hard on the seeded score. If AniList says 8.4, the output rating is 8.4 with no reasoning about why. That defeats the point of having editorial judgment — a rating the site publishes should reflect our read, not AniList’s crowd average.

What I override on: any rating where my actual reaction to the show differs from the seed by more than half a point. If a season I genuinely think was mid-tier comes back with a 7.9, I drop it to 7.3 and add a line in the editorial commentary explaining why. Conversely, underrated seasons I’ll bump up.

This isn’t about being a contrarian. It’s about the rating being ours. If we’re going to publish it under our byline, it has to mean something.

Metadata and structured data

Every recap has a YAML frontmatter block with a dozen fields: title, anime, animeSlug, season, episodes, studio, genres, aired, rating, coverImage, scrollScenes, anilistId, malId, publishedAt, tags, description. The writer fills these in; the site’s content schema validates them at build time.

I still spot-check two things manually:

The genres list. The model sometimes adds genres the show doesn’t actually belong to because they sound adjacent. A military sci-fi show is not “Fantasy” just because it has uniforms.
The tags. These become URL slugs elsewhere on the site; bad tags create weirdness. I keep them lowercase-hyphenated, limited to ~6-8 per post, and drawn from a consistent vocabulary.

Both of these would be fixable with a stricter schema and a vocabulary file. I keep meaning to build that.

What I don’t check

Two things I’ve stopped reviewing line-by-line, because the model is reliably good at them:

Episode ranges in section headers (“Episodes 1-4”, “Episodes 5-12”). The model pulls these from the research JSON; they’re almost always correct. I still verify episode totals match what the sidebar shows.
Arc structure. Anime seasons tend to have well-known arcs with well-known names; the model knows them. I’ll occasionally rename an arc to match what the show itself calls it instead of fan naming conventions, but the structure is fine.

What shows up in the commit history

I track editorial changes as git commits on the branch where the PR lives. A typical review produces three to six commits per draft:

edit: tighten TL;DR prose
edit: fix spoiler tag around <detail>
edit: standardize to "Eren Yeager" throughout
edit: rewrite purple prose in "Highlights" section
edit: drop rating from 8.2 to 7.8, add reasoning

This isn’t ceremony for ceremony’s sake. Every edit in the history is evidence that human review actually touched the content. If there’s ever a dispute about whether a specific claim was AI-generated or editorially added, the commits are the record.

The overall time budget

An average draft takes about 35-45 minutes of review. Breakdown:

First read: 10 minutes. Catch obvious issues.
Fact-check pass: 15 minutes. Cross-reference specific claims against the wiki and metadata.
Prose pass: 10 minutes. Tighten, standardize, cut purple language.
Rating and commentary pass: 5 minutes. Override the rating if I disagree, and check the editorial commentary reflects my actual opinion.
Final spot-check: 5 minutes. Re-read the TL;DR, preview the post in dev mode, check sidebar metadata.

At that rate I can review one recap per sit-down, not ten. That’s the actual throughput ceiling of the site — not the pipeline’s draft rate, which is much higher.

Is this “AI-assisted” enough to disclose honestly?

Yes, I think so. The model does the heavy lifting of transforming research data and plot knowledge into reader-friendly prose. The human does fact-checking, tone control, and editorial judgment. Both are real work; neither alone would produce the site.

What would not be honest is calling it AI-assisted if my review were skimming instead of editing. The difference between the two is whether the published version differs meaningfully from the draft. In every recap I’ve published, the answer is yes — usually by dozens of edits across hundreds of words.

If you’re building a similar operation, the honest disclosure depends on your actual review behavior, not the word “AI-assisted.” If your reviewer is reading for 5 minutes and clicking approve, call it AI-generated and live with the consequences. If they’re doing the work I just described, “AI-assisted, human-edited” is accurate.

Either way, be specific on your editorial-standards page about what you do. Vague claims read as evasion. Specific claims read as ownership.

Spot an error or have a suggestion? Request an edit →

← More writing