Podcast transcripts, polished for reading

This is THE Way to Build Custom AI Agents in 2025—Full Demo + Prompt Tips | AI News & Strategy Daily | Nate B Jones Transcript

Polished transcript · AI News & Strategy Daily | Nate B Jones · 24 Sept 2025 · 22m · @maverick

Nate B Jones demonstrates Notion AI as a custom agent builder with live examples and prompting tips

A solo tutorial by Nate B Jones of AI News & Strategy Daily, walking through Notion AI's latest release as a practical agentic automation platform.

Summary

Nate B Jones argues that Notion AI's latest release is, in effect, a custom AI agent builder — a framing Notion itself has not used — because it combines databases, AI, and external connectors into a system capable of autonomous multi-step work. He presents eight prompting principles developed through hands-on testing, noting that the system is more prompt-dependent than ideal and that casual one-line prompts produce poor results. He then demonstrates three live use cases: a meeting-notes-to-PRD-to-backlog pipeline, an interview coaching scorecard, and a prompt evaluation harness — each built from a single structured prompt in a matter of minutes. His central argument is that Notion is enabling an "agent-powered work factory" accessible to individuals at roughly $20 a month, and that most users are underestimating what the platform can do.

Key Takeaways

  • Notion AI is being underframed by Notion itself. Jones argues the platform should be understood as a custom AI agent builder, not merely an AI assistant layered onto a wiki — because the combination of databases, page-level permissions, and external connectors enables genuinely autonomous multi-step workflows.
  • The system is heavily prompt-dependent. Casual or vague prompts produce weak results. Jones found that structured, strictly worded prompts — developed with the help of ChatGPT in thinking mode — were necessary to get reliable output, which is a meaningful usability limitation.
  • Eight prompting principles govern reliable Notion agent behavior. These include: scoping the agent to specific pages, defining what "done" looks like and requesting a receipt, thinking in tables rather than text, using explicit quality checks, preventing duplicates, maintaining a run log, writing in plain strict language, and prohibiting the model from inventing data it cannot find in the source material.
  • Tables-first is the right artifact format for agentic work. Jones argues that database rows are easier for agents to operate on, sort, fix, and update than narrative text — and that this represents a broader shift away from the traditional long-form document culture toward agent-readable structured artifacts.
  • The meeting-notes-to-PRD pipeline compresses days of product work into minutes. By feeding meeting transcript data into a structured Notion prompt, Jones generated a full PRD with acceptance tests, goals, problem statements, and team-specific to-do lists automatically — work he says previously took days as a product manager.
  • Notion's interview coach example shows multi-database systems can be spun up from a single prompt. The scorecard rubric covers clarity, impact, specificity, and structure, and can score practice interview answers from a transcript — illustrating how flexible the agent environment is beyond professional workflows.
  • A prompt evaluation harness inside Notion lets users track and score their own prompts. Jones demonstrates a database that logs prompt experiments, inputs, versioning, pass/fail results against a rubric, and run history — making Notion a viable prompt management tool for people who want to improve their AI outputs systematically.
  • MCP connectors are central to Notion's scaling strategy. Notion is building out Model Context Protocol server integrations with Google Drive, Gmail, Linear, and GitHub, and Jones expects a future release to allow businesses to add custom MCP connectors — making the platform progressively stickier as more data centralizes there.

  • FULL TRANSCRIPT

    What Notion AI Actually Released

    Nate B Jones: I'm here to tell you about the very easiest custom AI agent automation software out there today. Not a lot of people are talking about it as if that's what it is, but that's what it really is. We're going to talk about it. It's Notion AI. They just released this week.

    The reason I'm calling it custom AI agents is because of the way Notion has been able to marry together databases and custom connectors to other tools that you use throughout your daily life, and also the power of AI. We're going to do this in a couple of sections in this video. Number one, I want to tell you what Notion released. Number two, I want to give you my live actual notes using it, including things that don't work well and things that do work well, tips for prompting, all that stuff. And number three, I want to actually show you what Notion can do with specific examples.

    We're going to get into Notion as an interview coach. We're going to get into Notion taking your meeting notes into a product requirement document into a backlog. We're going to get into Notion helping you with your prompts as a prompt evaluation harness. There's a lot of really cool stuff in here, and it underscores how flexible this tool is, which is why I'm calling it a custom AI agent builder — even though, spoiler alert, Notion did not call it that.

    So what's in the box? What did Notion claim that they released? What Notion called this is really an AI-powered agentic future. They talked about it as AI agents across your Notion portfolio rather than AI agents powering your whole workflow. And I think there's a really big difference there.

    What they want you to see is that Notion's AI agents can perform autonomous work across multiple steps. They claim up to 20 minutes. When I was testing it, I got five or ten minutes pretty easily. They are adding tools to make that more useful, so it's not just Notion — they're trying to add other connectors as well, including Google Drive, Gmail, Linear, the GitHub tool stack, and a bunch of others. It's like they're trying to add as many tools as they can.

    They also say that very shortly they're going to give you the ability to have customizable agents that act like teammates and take specific workflows for specific projects across departments. So imagine you want to always take a contract from sales and move that contract into technical requirements for your engineering team. They're trying to build custom agents to solve for that use case. And of course Notion benefits because you're pulling more of your data into Notion.

    This is a really interesting value proposition because when you hit their landing page, what they say is that Notion saves you money. You want to spend your money here because Notion saves you money on a bunch of other things. That has been one of their larger value props in the age of AI. I think it's going to resonate because everybody knows that you're not going to pay $100 here and $100 there and $200 the other place just for AI. You want a single home, and Notion is trying to be that home by making your data at home in Notion with AI. We shall see. But I want to show you some use cases that make it pretty tempting.

    One of the things that they have enabled AI to do that I think we don't really easily see other places is granular database row permissions. Notion now has page-level permissions for databases. You can actually have Notion AI make granular database controls and database changes per row. So, for example, if you are trying to do cold outreach to a contact for a B2B business, you can have Notion look at the response in Gmail, look at meeting notes in a transcript, and then come back and update a database row in Notion in your CRM. That helps you understand where that prospect is in their journey, and maybe you move them along in the sales pipeline. That's the kind of thing they're envisioning, and it does work well for that.

    They are also strongly advocating a universe of Model Context Protocol connectors. Remember how I've talked in the past about MCP as something Anthropic seeded into the ecosystem and engineers have now picked up and used across all of AI — that is true at Notion as well. Notion is using MCP servers and bragging about it, and they are implying quick scale as a result. They want to add more MCP connectors. I would strongly expect a 3.1 or 3.2 release to allow businesses to add custom connectors with their own MCPs to pull in yet more data, because that's very much what Notion wants to do. If you centralize more data here, you'll be stickier. You'll use their AI. You'll pay. You'll stay.

    Live Testing Notes and Eight Prompting Principles

    Let's get into the actual experience I had. What did it actually look like? Did it actually work? Spoiler alert: it worked, but it is more prompt-dependent than I think you or I would want.

    I tried multiple ways of prompting the system. I tried the more casual, just-make-it approach — a one- or two-line pass where I said something like "just make a database for this." It did not work as well. The ability of the system to understand what I wanted seems to be somewhat dependent on really strongly typed prompting or really strongly structured prompting.

    I came up with eight prompting rules as I ran through these experiments that I want to share with you here. I think they are highly correlated to successful Notion prompting. And I'm going to go a little bit further — even if you're not in Notion, these are going to be useful prompting tips for a future where we are building digital artifacts with AI.

    I talked in a previous post about this idea that work is changing. We are going from a world of work where we handwrote our artifacts like docs, to a world where artifacts are more interactive — where we can produce something and interact with it like an applet, or ask people to contribute, or maybe automatic actions are taken. Notion is really at the forefront of this trend, especially with the idea that agents can take action against a page and update database rows as they go. But to do that, you have to apply these prompting principles.

    Principle number one: Be really clear about where you want this tool to work. You need to say "work only on this page and its subpages" or something like that, because you don't want Notion to be broadly scoped and making changes elsewhere. In many people's Notion wikis, that is equivalent to an unwanted code change. You don't want that. So specify where it works.

    Principle number two: Tell it what done looks like. Ask for a receipt at the end. It will listen. Say: "When finished, please add a line at the bottom of the page — either 'okay,' or if you're blocked, add 'blocked' and a reason why." This will let you know right away in the page text itself what it did and why. Getting receipts helps you be more auditable and track what actually happened. I've actually developed a prompt that shows audit logs of previous runs right on the page itself, which I think is important if you're starting to make serious changes.

    Principle number three: Think in terms of tables and databases rather than text. If you're creating things in databases in Notion, you're leaning into Notion's strengths. Tables are much easier to sort, to operate against, to review, to fix later, to adjust. Whereas raw text can be difficult to format and engage with. I tried both with Notion, and I really felt like a tables-first approach was much more useful for the kind of tasks I was doing.

    I think this is something that's going to change the way artifacts are formatted. I'm used to a world out of Amazon with product requirement documents that were narratives — yes, you have some tables, but you also have a lot of narrative about the customer experience. We are moving to a world that is video-heavy and that also has tables. It's a really interesting change for someone who came up before all of that happened, in the traditional six-pager era. But here we are. Make tables first, text second.

    Principle number four: Use quality checks. One of the things you will really thank yourself for is being explicit with the agent about the conditions under which it can mark a task accomplished or done. You can have it check the length of a particular piece of text — is it 180 letters? You can have it check if it includes every piece of data that's relevant for the task. So if you're writing a cover letter and you tell Notion to do that — which, by the way, it can — have it include the company and the role. That seems like a reasonable requirement. If you are writing a justification for why you should work somewhere, make sure it includes at least one number from your resume. You can give it the resume and it will do that.

    The specificity you can use around quality checks is something that people forget about, but it's a critical part of learning to work with agents. They need that degree of clarity in order to know they did a good job for you. Otherwise, they'll just guess, and they may hallucinate, or they may not do anything at all, or they may default to token efficiency and do less than expected. So use quality checks, and also be really clear about what the model should do if it doesn't have a required item. You can write: "If info is missing, please insert TK/confirm" — which is a traditional editorial notation. Or you can say "insert in brackets: please check this." This helps the model not just depend on vibes but actually get to a pass/fail mindset.

    Principle number five: Don't create duplicates. If something similar already exists, you want the model to update it instead of creating new copy and dirtying up your context window. We're starting to think of Notion — and really agentic tools generally — as context windows themselves. Even if you can't absorb all of it in the context window, Notion itself is becoming a place where you always have that in mind because of the way AI operates on it. And if it's a context window, it needs to be clean, which means you don't want duplicates.

    You can actually specify: "When you touch a page, if you updated it, please include a little table with a version number and last-run date to describe your edit and describe the last time you touched it." This enables you to see what happened and see how pages changed. It's one of the things that's going to be increasingly important as humans and agents work together in wiki-like environments.

    Principle number six: Create a run log. Think about each change you're making as if it's something that may need to be undone. One of the hallmarks of good agent architectures is the ability to hit undo. I appreciate that Notion has put a literal undo button in the chat interface — I haven't seen that a lot, and I think people are going to appreciate it. But if you print a tiny run log, it's going to help you go further. You can stick that into the prompt. It extends the idea of a page update note into a full run log that actually has links to what happened, warnings for when things go wrong, and so on. The more you invest on the validation and audit side, the more you can keep your context window happy.

    And by the way, if you think this is overkill — this is just not that hard if you have the right prompt. I did not actually have to suffer that hard creating the pages I made, because I was able to work with ChatGPT in thinking mode to create the prompts. I'm going to go through some of those conversations, the prompts, what I got. You'll get the idea of how I was able to use these eight principles without too much blood, sweat, and tears on my part.

    Principle number seven: Write in really plain, strict language. Say "create six questions" instead of "create a few questions." Say "use one metric" instead of "use a metric." Wherever you can, avoid open-ended phrases that will encourage the model to hallucinate — unless you are happy with hallucination. As an example, "be inspiring" is not a helpful frame for a cover letter. Asking it for a specific metric, asking it to include the name and the company, asking it to include a specific reason from your resume — this ensures the agent stays consistent. Write as plainly and strictly as you can. ChatGPT is actually very helpful in this, working with its default language preferences.

    Principle number eight: Don't let it make things up. I know I said "unless you want hallucination," but really, wherever you can, you want to underline to the model: "If you cannot find a claim in the input data that I give you, please use 'check this' and do not mark it as done — come back to me." You don't want to get into a situation where you're generating dirty data and then the model is basing future actions on that dirty data.

    Sample Prompt Structure in Practice

    With that in mind, let's put it all together and look at a sample prompt in ChatGPT that helps us understand how Notion works, and then look at some Notion pages I was able to create with that kind of approach.

    Here we are. We have the role — "you are a Notion agent" — and you want to specify and limit the page and the subpages. You want to make it clear what done looks like. This is where I include showing receipts, where I include showing what "blocked" is and why. You also want to make sure you define the scope. I do not want it touching things that were created or edited a long time ago. Again, I'm trying to keep this context window as clean as possible. Please do not overwrite unless you are updating a newer version — so it's very precise about when overwrites happen and why.

    There's a table, and I give it the choice to create or not. I could have this table already present or not. If it's not there, I tell it to create it. I'm literally giving it the columns and showing it what I want and what the format is of each column. It's very specific: name, which is a title; notes, which are text; a version, which is a number; and so on.

    Then I get to the tasks the model should do. Find up to five items that need work — you can see we're starting to build a to-do list here. For each item: draft the content in the table fields, run quality checks (see below), and you tell it to look down. If all the checks pass, set the status to "ready." If the checks fail, set the status to "needs fixed."

    Quality checks then include: length is within limits, company and role if relevant, at least one number, avoids banned words, and if info is missing, insert the placeholder. By the way, "avoids banned words" will help you. If you are writing for an AI-detection tool — there are some tools now that companies use that claim to detect AI wording — you can pretty easily work around them if you come up with a list of words that AI tends to use, like "delve," and make sure it doesn't use them. There are ways you can start to shape the writing style here.

    Then it gets into duplicates and versions, how you handle updates, what the requirement is — it's the last ten minutes in this case. It gives me a version number, and then finally: "Please add a row in the run log with the time, the items changed, and any warnings."

    As complicated as those eight principles sounded, I got all of that fixed into about a 20-line prompt, and it's relatively easy to run. Let's see what it looks like in practice on a few actual pages.

    Demo: Meeting Notes to PRD to Backlog

    All right, here we are. We are looking at the meeting notes to PRD backlog. I constructed this in just a couple of minutes. As long as you have the data, you can do that too.

    You might be wondering how I did this — it looks really complicated. There are multiple tables here. They scroll along. You can see that these tables have statuses that have now changed. You have PRDs, and if I click a PRD, I'm actually going to see a real page. Let me just click that. I can click it and look in and see where the PRD is. It gives me an acceptance test, a goal, a problem statement. It's actually writing the PRD as a table, which is really cool, because then it can do operations against individual components of that PRD.

    This is a great example of an artifact that is created to be agent-readable first and human-readable second. It's very easy for me to look and say, "What is the TL;DR of Notion agents' reliability?" and I can get a nice summary from Notion. If I go here and copy and paste this — let me just pull up the actual Notion — I can say "please give me a 20-word summary of this PRD," and it will come back and work on doing that as we chat. It's looking at it, thinking about it, and there you go. That's what's in the box.

    Then I can ask: "What is the highest-risk element of this build?" I can actually start to inquire into how it works, and so that's one of the powerful things here. You can start to actually ask it to exercise judgment, ask it to think through — it's talking about schema drift here, and you may or may not agree — but the point is you can have that conversation with it very easily.

    Now if we go down, it can actually go and automatically fill out to-dos associated with these PRDs. These are all associated with particular PRDs — to-dos for the data team, for the email team, the backend team — all automatically created. All I did to add to this was input some data from meetings. And you can actually even automate that, because Notion now has meeting notes that you can take by audio.

    I'll share the original prompt that I used for this, and you can see how you can make it your own. But it illustrates to me that it's increasingly possible to move from a world where you consider these artifacts as static to one where they are truly dynamic. You can actually evaluate how the overall projects break out in a matter of a couple of minutes rather than a couple of days or even longer. I remember when I was doing PRD work as a product person — what I'm showing you here would have taken days. It took about two minutes, and I think that's really compelling.

    Demo: Notion Interview Coach

    Let me show you another cool example. This is the Notion interview coach. It may not look like a lot, but it gives you a rubric and everything you need to actually run your own Notion interview scorecard.

    This is simulated data, but what you see here is an entire database where it can take a notes transcript — let's say you interview yourself and you're practicing your answers. It can take a notes transcript with questions, feedback, and interview responses, put it into a database like this, and run it against a rubric for clarity, impact, specificity, structure, whatever you want. It scores it and delivers an overall scorecard to you of how you did.

    I think that's really cool. It was relatively easy to spin up. There's a lot more we could do, but it shows you that you can actually build an entire system with multiple databases off of a single prompt and start to populate it with real data and get it going from there.

    Demo: Prompt Evaluation Harness

    Let me show you one more, and I think you'll find it gives an overall picture of what Notion can do. My goal here is not to give you the complete picture of Notion because I don't think I can do that. I want to give you a sense of how I think Notion undersold this. This is actually an agentic artifacts factory. There's a lot more to this, and I think that with proper prompting you can go a long way.

    So this is a prompt and eval harness. This is going to be more technical. You can look at particular experiments that were run, a date for those experiments, inputs, versioning essentially, and then results that were scored pass or fail based on a rubric. You can get into eval rules: what must the prompt include, did the prompt work, did it not work, a run log on updates.

    At the end of the day, I think what you should be taking away from this is that you can do things as nerdy as self-improving your Notion prompts by using Notion AI. You can do things as detailed as getting into particular prompt structures for different tools — say your Perplexity prompts, your OpenAI prompts, your Claude prompts — and start to track them in a thoughtful way in a database.

    I've been told by a lot of people that they are looking for great prompt tooling. There are lots of answers to that question depending on your workflow. But one of the answers is Notion. One of the answers is actually building out a prompt database in Notion and starting to track and score how your prompts do. I'll share this prompt in the post.

    Now, if you're one of those people who says "I don't care about prompts" — that's fine. But you're probably going to be getting better results if you take your prompts this seriously and actually start to score them. And of course you can adjust them to score however you want. This is just a sample score, and you can see how the sample score works.

    The Bigger Picture: An Agent-Powered Work Factory

    This is one of those things that I think is getting slept on. I'm sharing about Notion because I think we need to get past an assumption that work is a series of individual things that we create with the help of AI, and move to an idea of an agent-powered work factory where agents are processing through these artifacts, often autonomously, and it's our job to prepare the environment and to shape the direction for these agents.

    That sounds super fancy and it sounds like a big-company thing, but Notion is making that possible for anybody. Notion is making that possible at a price of around $20 a month. It's really very affordable to have this kind of capability. I think that's really cool.

    I hope you've enjoyed this breakdown of Notion. I hope you see why I think it's really interesting. We are headed to a future where agents are powering artifacts. I hope these prompts that I'm going to share are helpful to you as well.


    Polished transcript of AI News & Strategy Daily | Nate B Jones. All views are those of the original speakers. Watch on YouTube ↗
    Published by @maverick
    More from AI News & Strategy Daily | Nate B Jones
    More from @maverick
    Summary