Nate B Jones demonstrates a structured AI prompt system for filtering low-quality AI-generated content
A solo presentation by Nate B Jones on using AI prompts as quality-control filters for AI-generated business content.
Summary
Nate B Jones addresses what he calls the "AI slop" problem — the explosion of low-quality AI-generated content across business functions — and argues that the solution is not AI detection tools but structured AI-powered quality filtering. He presents a detailed prompt he built specifically for evaluating Product Requirements Documents (PRDs), walking through its components including scoring rubrics, testability criteria, scope clarity checks, and JSON output schemas. His central argument is that organizations now have nearly unlimited AI-generated output but insufficient human attention to assess it, and that the answer is to use AI itself as the quality gate. He frames this as the beginning of a broader prompt pack covering marketing, sales, customer success, product, and engineering use cases.
Key Takeaways
FULL TRANSCRIPT
The AI Slop Problem Across Business Functions
Nate B Jones: We have a supply explosion of AI content, and it is an AI slop issue across every business I talk to. Every single one. They're like, "What do we do with AI slop?" We have product managers producing PRDs that are not good. We have marketers who are producing marketing copy and it's not good. Managers — or frankly any individual who wants to know they're actually writing well with AI — don't know what to do. How do we know that what we're shipping is good quality?
It's not about did the AI help. We're past that point. The AI is helping. It is about: is the quality output good? And we humans don't have the time to assess all of it, because AI is so good at producing it. So we are in a world now where marketers have gone from "can I write a blog post today?" to "I could stamp out 50." And now you have to ask yourself: can I assess the 50 that have been produced? Are they actually good? Are they brand-aligned? Is the content right?
And what people are doing for that? They're using their eyes, or they're skipping it, because there hasn't really been a good quality gate. But that's fixable. And what I want to tell you about is how I'm fixing it.
The Case for AI-Powered Quality Filtering
Nate B Jones: It is really simple. What you need to do is adopt the mindset that LLM attention is going to be the predominant attention mode in your business and everybody else's business going forward. You want to make sure you use that to your advantage. If you think about a world where in your organization you have only so many human eyeballs but you have nearly infinite AI eyeballs — well, use the AI eyeballs to your advantage and get them to check the work.
But it's not as simple as saying "please check this, is it good?" I've seen people do that and you get widely varying results. It depends on the AI you're using — is it ChatGPT, is it Claude, etc. It also depends on what the LLM already thinks of as good based on its training data and maybe its past conversations with you. It's not predictable. And so we need to get to a point where we have much more robust prompting for quality.
Think of it this way. A lot of what I talk about is robust prompting for what we would call the supply side — robust prompting to make the work that you produce better. I've talked about Excel in the past, written up guides for that, PowerPoint, Claude Code. But what about the other side? What about the filter side? The side where you need to test work and see: is it good? Maybe it's your own work. Maybe it's someone else's work. Maybe you're a manager and you're testing your team's work, and you could get a hundred blog posts in today from two or three team members and you don't have time to go through it — you'd be up all night.
Well, that's where the prompt comes in. That's where you can get help. Or you're an engineer and you're looking at this explosion of Lovable vibe-coded prototypes or PRDs that are coming from PMs and you don't know whether it's good. Again, a prompt can help fix it.
Walking Through a Real Quality-Filter Prompt
Nate B Jones: I think specifics are really useful here. So in a moment I'm actually going to go through a real prompt with you that I wrote for this, and I'm going to explain how it works. I'm going to do several of these — this is very much a case where the attention filter, the quality filter, needs to be per use case and per job family, or else it's not any good. You need to have one that is PRD-specific, that is blog-post-specific, because otherwise you don't have enough context to effectively ask the AI to help filter for you.
Your goal should be to have the AI do most of the reading — to do what Andrej Karpathy has suggested, which is to have LLMs be 98% of our attention and the human eyes have the 2% attention that matters. I want to give you the right 2%, the highest-quality pieces, and effectively filter out the slop. That's my goal. If you are in a world where you have a hundred blog posts, what are the two that matter? What are the two that are super high quality that you should surface now? That's super valuable information. Where is the PRD that is really well thought through? That is promotable information. If a PM is able to do that with AI, they should be on a promotion track — and you want to know that quickly.
So that's what I'm setting out to do. How do I do it? Let me show you a sample prompt and we'll work through it together.
Demo: The PRD Quality-Filter Prompt
Nate B Jones: Okay, here we are, right on the prompt. We set the role first: "You're evaluating a product requirements document and your job is to determine if an engineering team can build this without needing three clarifying meetings." I love that specificity. You could flip it around and change it a bit, but what I want to be frank about in this prompt is how painful the current version is — because whether it's three clarifying meetings for PRDs, or whether it is "I can't tell which blog post to post," or "I don't know which of these customer email drafts," or "I can't tell which of these sales outreach drafts," or "I don't know about the webinar invite and the webinar event schedule" — all of them, you were having meetings. You were having human conversations. You were putting human eyeballs on them if they don't work.
So let's set the stakes, and then I define the axes. This gets very specific to the PRD, and that is the point. We ask about the completeness of the document. And by the way, this is somewhat scalable — you can ask about completeness of other artifacts too. In this case, we can ask about acceptance criteria, whether edge cases are present, whether non-goals are explicit or implicit. And we can extend this. If I wanted to add to this, I could say "do we have at least seven requirements clearly enumerated?" — that's a bit of a random number, I chose not to include it here, but if you have a particular format for a particular artifact, it's easy to modify and add that in.
Then I give, right in the prompt, a score rubric. A five score looks like a measurable edge case documented, non-goals prevention — it's a good score. A zero is untestable, it has a vague statement. And I have seen this as a PM. I have seen other people write really vague goals and it's just terrible, and we all know it's bad. And by the way, I think the AI slop conversation is somewhat overhyped, because I remember seeing phrases like this long before AI. In a sense, what we complain about as AI slop is the latest version at volume of a larger issue — which is that we have always had bad work problems and sloppy work problems. And now maybe we have the tools to address it regardless of who wrote it. I don't care if AI wrote it. I care if it's good.
Scoring Dimensions: Testability, Scope Clarity, and Decision Framework
Nate B Jones: So then we go down to the next one: is it testable? This is a very PRD-specific thing, but you can do something similar with other types of documents. For example, you could ask if it's readable for a sales follow-up email — "does it read at an eighth-grade level?" would be a reasonable test to have. And you can go through and define that. This is not the only prompt I have. I have a bunch of prompts for different goals. My goal is to give you a complete pack that gives you a filter you can apply to various business functions.
We have the scoring, we have test cases, success rates, failure states, example inputs and outputs — this ensures that it's actually testable.
Scope clarity is another example of something that is PRD-specific. We have our scoring. You can imagine how this can be somewhat different. If it's a customer announcement email that you're sending for a new product, you can ask if the product is clearly explained. The point here — and the reason I'm pulling in these examples from across the business — is because this is how many different places the AI slop problem touches. Everywhere I look, there's a slop problem. And we need to have a filter. That's why I said: you know what, we're just going to write some prompts and make it easy.
Decision framework — key choices, is the rationale explained, are the trade-offs acknowledged, etc. Very PRD-specific. And then we get into scoring. We go through these five: we have dependency mapping at the end there, and we have an elements check — is everything here?
JSON Output Schema and the Feedback Loop
Nate B Jones: And then we give it an output. I am not someone who's going to tell you JSON is magic — JSON is not magic. But it is certainly useful in cases where you want the LLM to understand a particular rubric and output schema and follow it. In this case, what we really want is to understand a grading score. We want to be able to say: based on these scores, how do we write this in plain English? And then how do we write a key sentence that someone can use as feedback?
This is where the magic lies. It's no good if you just go through the elements at the top and say "oh, this is bad" — it scored zero or it scored one. What you want to do is go through and say: what is the actionable feedback someone can take? And sophisticated writers will be able to pull this JSON and feed it into an LLM along with the draft and get very actionable feedback for how to improve — and use AI to actually power better writing, higher-quality writing, and come back. I think this is the beginning of a really interesting feedback loop to actually build an anti-slop machine at work.
For each of these dimensions we have specific actionable feedback that we're specifying as an example. And then we have thresholds — you can be specific about those. You can say "I accept a three overall or more," or a 4.8, or whatever. But you have a score, you have a revise/accept/reject decision, and you have feedback. And really, isn't that all we need?
We need someone who can say, "You've got to specify the Stripe API, because if you don't specify the version, engineering is going to have to come back." And it's not that hard — specify the most recent API version, just be clear about it. That is the kind of feedback that LLMs are very good at providing if properly prompted, but that humans are stuck with providing now because we haven't had prompts to fix it. That's why I built this.
Closing: Building an Anti-Slop System
Nate B Jones: So there you go. I think slop is a fixable problem. I don't think it can be fixed with one magic bullet — there's not one magic prompt for this. Nor do I think just instituting this as a filter is the only way to fix it. You have to write better too. But let's assume — because every organization I've met has this problem — let's assume you have an AI slop problem. Great. Safe assumption. How do you filter? That's what this is about. How do you make sure that you can use AI as a weapon in your favor so that you focus your attention where it matters and you scale useful feedback?
Walking through this prompt structure should be helpful for you to understand how we think about grading a piece of work in context. And you'll notice I don't need to know a ton about your organization to do this. I can just assess the gates and say, "Okay, well, I think that from a good best-practice PRD perspective, this is probably weaker." And the nice thing about a prompt is you can tweak it. You can say, "Okay, well, we're not an API business — maybe that's not what we care about, we care about front end." You can adjust that pretty quickly. You can have the same degree of specificity and it works.
So my challenge to you is this: assume you live in a world where 2% of human attention matters and you need to put it in the right place. Find the mechanisms in your business, find the mechanisms in your workflow, that enable you to put that 2% attention where it matters. This is one of those. This is one of those that helps you weed out AI slop.
This is so much more useful than the AI detector stuff, because the AI detector pretends that if you can detect AI — which they can't — then you will stop slop. But slop has always been a problem. We humans have produced poor-quality work. I've seen bad-quality PRDs for a long time before AI. This is really about raising the quality bar. We don't care how you wrote it. We care that you're accountable to raising the quality bar. And then this becomes a really useful feedback tool — for yourself, for your manager, for whoever — to improve the quality of the outputs.
I'm putting together a complete prompt pack for different business purposes, so you get a sense of how this works for marketing, for customer success, for sales, for product, for engineering. That's the goal — to give you a starter on a filter gate so you can put your attention where it matters and you stop drowning in the slop.