Podcast transcripts, polished for reading

How I Improved AI Output Quality 10X With One Prompting Shift | AI News & Strategy Daily | Nate B Jones Transcript

Polished transcript · AI News & Strategy Daily | Nate B Jones · 13 Nov 2025 · 12m · @maverick

Nate B Jones explains "Goldilocks prompting" — finding the optimal level of detail in AI prompts

A solo presentation by Nate B Jones on how to calibrate AI prompt length and specificity for better output quality.

Summary

Nate B Jones introduces the concept of "Goldilocks prompting" — the idea that prompts can be too long and over-specified, too short and vague, or just right. He argues that roughly 80% of AI prompting tasks benefit from a moderate level of detail that gives the model enough context and direction without exhaustively dictating every element, while the remaining 20% of tasks genuinely warrant highly specific, detailed prompts. To illustrate the difference, he demonstrates a Thanksgiving family newsletter built with a vague prompt versus one built with stacked Goldilocks-style context snippets, showing a marked improvement in layout, color, and readability. He also shows how the same principle applies to software engineering prompts, where guiding an LLM away from over-engineered patterns toward pragmatic solutions produces better results.

Key Takeaways

  • Prompting has an optimal altitude. Over-specifying a prompt burns more tokens, risks hitting memory limits, and suppresses the model's creative problem-solving — while under-specifying leaves the model making assumptions that may not match your needs.
  • The 80/20 rule for prompt length. About 80% of tasks benefit from a concise, well-framed Goldilocks prompt that allows the model some creative latitude. Only around 20% of tasks genuinely require exhaustive, step-by-step specification.
  • Stacking short context "slugs" is a powerful technique. Rather than writing one long monolithic prompt, Jones demonstrates combining several short, focused prompts — covering layout, color, and fonts separately — which together produce significantly better output than a single vague or over-detailed prompt.
  • Token limits as a self-discipline tool. Jones recommends setting a personal token ceiling (he targets under 500 tokens) when writing Goldilocks prompts, as a practical way to stay at the right level of abstraction and avoid over-engineering the prompt.
  • The principle transfers across domains. The same Goldilocks approach that improves a newsletter design also improves software architecture prompts — for example, steering an LLM away from defaulting to microservices or premature abstraction toward simpler, more pragmatic solutions.
  • Goldilocks prompting is a learnable and shareable skill. Jones frames these prompts as reusable, modifiable templates — tools that can be copied into any LLM (Claude, ChatGPT, Gemini, Grok, Qwen) and adapted, giving practitioners a reliable, flexible toolkit rather than brittle one-off instructions.
  • FULL TRANSCRIPT

    What Goldilocks Prompting Is and Why It Matters

    Nate B Jones: We're going to talk about Goldilocks prompting. Goldilocks prompting is the idea that you can prompt too much and you can prompt too little. I know that might sound funny to some of you, because I'm the guy who does the prompts and people think I'm known for these long prompts. I am here to help you prompt more effectively. And I want to remind you that there is an optimal level of clarity for the goals that you set out to accomplish with the model. You can be over-clear, you can be over-long. So we're going to talk about Goldilocks prompting and why it makes such a difference.

    I'm actually going to show you an example of the incredible improvement in model output quality that you can get when you use Goldilocks prompting.

    What is Goldilocks prompting? Very simply, it is giving the model enough context so it doesn't assume stuff about you that isn't true, and enough about the problem that it knows the direction to go in. It also includes giving it clarity on what tools it can or should use. It is not exhaustively listing every single thing you want done.

    So if you're making a PowerPoint, for example, you could say: "I want you to make the font exactly this. I want you to make every single slide exactly in this way, with this headline size, with this layout, with this particular bullet style, with each of these individual bullets in exactly this text. Here's the pie chart that goes on slide seven." You get the idea. Or you could say, "Make me a PowerPoint. Make it good. The board's going to be looking at it." We would probably not do that one — but we might be tempted to go to the other extreme and make it hyper-specific.

    The Trade-Off Between Specificity and Creativity

    One of the things I've been thinking about a lot as I've wrestled with prompting and context engineering over the past couple of months — really longer than that — is that there is an optimal level of detail, and there's a trade-off involved. If you want to give the model as much clarity as I described, where you're specifying every minute detail, the model will go there, especially the newer ones. It will increase the token burn, so you're more likely to run into memory issues. It will also reduce the creativity, because you're not engaging the creative circuits, for lack of a better term, of your model.

    So it's a trade-off. You have to decide: do you want to be so specific that the model burns through a lot of context and follows your exact design — and maybe that's what you want, or maybe it isn't? Or do you want to back off and give a more general-purpose ask that allows the model to be a little more creative and might be more token-efficient?

    In my experience, about 20% of the time you do want that level of specificity. You're saying, "This is going to be a lot, but I need it to be exactly like this. Don't mess it up." And about 80% of the time, you want to prompt at the right altitude. You want a Goldilocks prompt.

    That is actually tougher than it looks. It's really, really tough to prompt at the right altitude, because it's so tempting to either over-describe or under-describe. I want to give you some tools to help with that.

    Anthropic's Visual Example of Good, Bad, and Ugly Prompts

    One of the things I can do to help is give you a visual example of what good looks like in terms of a prompt. This is directly from Anthropic — I didn't make this one up. They put it out publicly on their context engineering blog, and I thought it was really helpful.

    Here we are. This is the system prompt example that Claude is showing us for good, bad, and ugly. So good is here, bad is here, and ugly is here.

    If you have the right level of detail, Claude understands the role it has. It understands the tools it can call. It understands how it can respond, and it understands the guidelines. And that's it. This is all for a made-up example called Claude's Bakery — a nice, simple illustration.

    This one is really bad. It's a very short prompt and it doesn't give Claude anything it can use to actually be effective in its role. There's no shared context that Claude can invoke.

    And this one — I'm going to call it ugly — because it's so specific. Here's an exhaustive list of cases. Here's the user intent. This prompt is trying to do everything. This prompt might actually be six or eight prompts in a trench coat, and it just keeps going. It doesn't want to stop.

    So that's a visual example of how much of a difference it makes to have a prompt that works well.

    Setting a Token Limit as a Prompting Discipline

    I am finding that one of the things I can do to make prompting easier — when you're trying to prompt at the right altitude — is to set myself a token limit. I set myself a rough number of tokens that I want to stay under, in order to ensure that I think at the right altitude for the prompt.

    Now, again, this is for the 80% of cases where you want to allow some creativity. There will be those 20% prompts — and I've written them — that are super long and very detailed, and we can go there. They sometimes consume more model resources. They can be very precise. They're a tool in the toolbox.

    These 80% prompts can be shorter, easier to understand, easier to iterate on. That's why I call them Goldilocks prompts — they feel the right size for a lot of things. I tend to keep these under 500 tokens.

    Live Demo: Vanilla Prompt vs. Goldilocks Prompt

    I want to show you how much of a difference it makes. We're just going to have fun with this. I'm going to show you a vanilla prompt where I just say "make it" — super vague, no context — and I'm going to say "make a Thanksgiving newsletter." You're going to see what the model does. And then I'm going to show you the difference it makes when I add an extra set of Goldilocks prompts so that Claude knows what it's doing.

    Okay, here we are. That prompt on the left: "Can you create a family newsletter?" That's all I gave the model. And this is what Claude comes back with. It's super basic. You can see it has a few visual elements, some annoying orange highlights, and a spot to add family photos — but of course that's not clickable or usable. The copy is pretty generic. It is a family newsletter I would not want to send to my family.

    But what if we add a more effective prompt?

    Here we are looking at the exact same prompt, except I've added some Goldilocks prompting. In fact, this is a slightly advanced technique — I have stacked up some Goldilocks prompts. The advantage of having several shorter ones is that you can be more targeted and effective. I have a layout prompt here that focuses on non-annoying layouts and specifies those. I have a color prompt that talks about the kinds of colors that would convey trust or feel modern. I also have a font prompt. All of these are important for getting the output into better shape.

    If we move over here, we see the font has been chosen carefully. We see the impact of the layout. We see the colors are chosen much more carefully, and the overall impression is readable. I'm not going to say this is the most beautiful family newsletter you've ever seen, but I have seen much worse formatted family newsletters. It even includes a handy sidebar with a quote that's not horrific, and a nice little footer.

    The Same Prompt Tested on ChatGPT

    So what's the point of showing newsletters about Thanksgiving when you're trying to learn prompting? You want to start to get a sense of whether these prompts make a difference. What I'm trying to convey is that adding these slugs — these context snippets — can help Claude or ChatGPT know the difference and actually build a better newsletter.

    I did try this on ChatGPT as well. ChatGPT was also able to code up a nicely-looking newsletter. You can decide whether it's more or less nice — it's more of a font-heavy approach — but it followed the Goldilocks prompt, and it's useful.

    Here it is. This is ChatGPT's effort to respond to the exact same prompt. You'll notice less of a visual element, but there's a clear investment in the fonts — some really fun fonts here. You see the use of the layout piece, the ability to bring pop-outs in. It's not perfect; some of the layout work isn't as strong. But I found it very readable and easy to understand, and it was certainly better than the vanilla version.

    Sometimes people see my demos and say, "Nate, I could make a better newsletter. Why are you trying to show this?" And the answer is: you probably could make a better newsletter. The point is to give you tools to do that effectively. Go make a better one. My goal here is to show you that you can take these slugs of context and use them to make useful work.

    Applying Goldilocks Prompting to Software Engineering

    You can do this same thing with business writing, with documentation standards, with engineering standards. Basically, you can take anything that you need Claude, ChatGPT, or Gemini to have an opinion on — at the right altitude — and apply this set of principles.

    Here we have Claude working on a specific skill. And yes, if you're wondering whether these can be skills, they can be skills. It's a bit difficult to read, but essentially all you're doing is telling the LLM what really matters. System design should solve real problems, not patterns. Avoid premature abstraction. Never use microservices as a default. Don't use a repository pattern before you have multiple data sources. And so on.

    In other words, we are taking things that might be tempting for LLMs to do — because they converge toward commonly seen patterns on the web — and we're saying, "Not on my watch." We're giving the LLM examples of pragmatic architectural choices that could be better: maybe it's a small enough codebase that you should use a monolith; maybe you should just build the straightforward solution and add patterns only when you feel pain.

    Goldilocks Prompting as a Learnable, Shareable Skill

    What I'm showing you deliberately covers both code and design. I'm trying to show you the span of the tool I'm giving you. And yes, I'm putting all of these up as both skills and prompts. They're small enough that you can literally copy and paste them as prompts into any LLM — into Qwen, into Grok, into ChatGPT, into Gemini. But they're super powerful because they focus you on the right level of abstraction. They give you a feel in your fingertips for what the right level of abstraction is. That's what I mean by Goldilocks prompting.

    You should start to get a feel for what is the right-sized prompt for that 80% of use cases where you want to allow the LLM some creativity and judgment to solve the problem. I think that space is underprompted. We often just tell people to use their best judgment. I want to give you a sense that Goldilocks prompting is a learnable skill.

    It's also a shareable skill. I'm going to share a bunch of these prompts with you so you can start to build them and modify them. If you don't like my examples, use different examples. But if you keep to roughly the same length and use a similar structure, you're going to end up with guidelines, useful context — a scaffold, if you will — for large language models to use, but not so much that it becomes a brittle prompt that often fails.

    That's what I want you to have. I want you to have a toolkit that feels like a well-worn chisel in the woodshed and a well-worn hammer — something you can use every day for a wide variety of tasks and not get lost in.

    So there you go. That is my plea to start thinking in terms of the altitude you're at. Think in terms of Goldilocks prompting. Ask yourself: am I prompting at the right level for the task I'm asking for? Good luck with Goldilocks prompting.


    Polished transcript of AI News & Strategy Daily | Nate B Jones. All views are those of the original speakers. Watch on YouTube ↗
    Published by @maverick
    More from AI News & Strategy Daily | Nate B Jones
    More from @maverick
    Summary