Podcast transcripts, polished for reading

Methodology

This page explains the podProse pipeline transparently: where transcripts come from, how they are processed, what is changed and what is not, how attribution works, and what quality controls are applied.

How transcripts are sourced

podProse sources transcripts from public YouTube videos via the Supadata API, which retrieves auto-generated captions from YouTube’s captioning system. No audio is processed directly. If YouTube has captions for a video, podProse can work with it. Transcript retrieval is entirely dependent on the availability of YouTube’s caption data for the video in question.

How transcripts are polished

Raw auto-captions are processed using Claude Sonnet 4.6 (Anthropic). The AI rewrites the raw caption text into structured, readable prose. The output is organised into three sections: a concise Summary, Key Takeaways, and the full polished transcript with topic headings and speaker attribution.

What is preserved

Every substantive speaker statement is preserved. The original meaning and intent of what speakers said is retained in full. Attribution to the original creator and channel is displayed on every article. The original YouTube publication date is shown alongside the podProse publication date.

What is edited

Filler words and verbal tics are removed (for example: “you know”, “like”, repeated words). Sponsor reads and advertisement breaks are removed. Speaker names are corrected where the auto-captions have them wrong. Light grammatical polish is applied to improve readability. The raw transcript is restructured into the Summary + Key Takeaways + Full Transcript format.

What is NOT changed

podProse does not add editorial commentary. It does not inject interpretation or opinion. It does not reframe or summarise away substantive content. No facts are added that were not in the original transcript. The polished text reflects what the speakers actually said, not what an editor thinks they meant.

Attribution policy

Every article links to the original YouTube video. The original creator and channel are identified on every post. The original publication date from YouTube is displayed. Users who publish a transcript are identified by their podProse handle. All articles carry the disclaimer: “The views expressed are those of the original speakers.”

Quality controls

After rewriting, podProse runs an automated quality check using AI. The check scans for potential name spelling errors, missing key topics, fabricated content, and structural issues. Problems are flagged in a quality report. A fabrication filter checks that no proper nouns or factual claims appear in the polished text that were not present in the original transcript. Name verification uses per-channel memory to maintain consistent spelling of recurring names.

Update policy

The lastmod date on each article in the sitemap reflects the most recent podProse activity on that post (publication or edit), not the original YouTube video publication date. This accurately represents when the podProse content was last updated.

Summary