Podcast transcripts, polished for reading

Claude Code 2.0 Has Arrived (It’s Insane) | Simon Scrapes Transcript

Polished transcript · Simon Scrapes · 12 Mar 2026 · 15m · @martymcfly

Four Critical Updates That Transform Claude Code Capabilities

A technical walkthrough of four Claude Code updates released simultaneously: loops, scheduled tasks, Google Workspace integration, and skills 2.0 with built-in testing.

Summary

The host walks through four recently released Claude Code updates that fundamentally expand the platform's capabilities. The updates include /loop commands for recurring prompts within a session, scheduled tasks for long-term automation, a Google Workspace command-line interface that provides access to the entire Google ecosystem, and Skills 2.0 with built-in evaluation and testing frameworks. Each feature receives a practical demonstration showing implementation, use cases, and current limitations. The host emphasizes that these updates shift Claude Code from a tool requiring constant manual prompting to one capable of autonomous operation across short-term and long-term timeframes.

Key Takeaways

  • Loops enable short-term recurring automation within a single session by creating cron jobs that run prompts at specified intervals — like checking an inbox every 10 minutes or running content repurposing workflows daily — but they expire after 3 days and only work while the session remains active, making them ideal for project-based monitoring rather than permanent automation.
  • Scheduled tasks provide permanent automation that runs independently of active sessions, creating fresh instances at specified intervals (daily, weekly, hourly) to execute prompts with full access to project files and skills — though currently limited to the desktop app and requiring the computer to remain on, they catch up on missed runs when reopened unlike loops.
  • Google Workspace CLI integration fills a critical gap by providing Claude Code with comprehensive access to the entire Google ecosystem (Drive, Docs, Sheets, Gmail, Calendar) through an open-source command-line interface with over 100 built-in recipes, producing properly formatted documents with headers and images rather than raw markdown requiring API calls.
  • Skills 2.0 introduces formal evaluation frameworks that replace manual trial-and-error iteration with automated testing against specific criteria, scoring skills on defined metrics (like adherence to reference files or use of persuasive techniques) across multiple runs, enabling data-driven skill improvement and A/B testing to optimize performance before production deployment.
  • FULL TRANSCRIPT

    Short: If you're using Claude Code, these four brand new updates completely change what you can build: loops, scheduled tasks, Google Workspace access, and built-in skill testing or Skills 2.0. Instead of watching four separate videos to figure these out, here's the shortcut. In the next 10 or so minutes, I'll show you exactly what each feature does, when you'd actually use it, and a quick demo of all four. Let's get straight into it.

    Loops: Short-Term Recurring Automation

    Short: Up until now, if you wanted Claude Code to check on something repeatedly, you had to keep coming into the interface, prompting it, come back, ask again, come back, ask again. But the new /loop feature changes that completely. It says here: "Run a prompt or slash command on a recurring interval."

    Loop lets you schedule recurring prompts inside your current session. You can say something like "/loop every 10 minutes check my inbox for important emails." When we fire that, Claude's going to create a cron job that fires automatically, and the prompt's going to be put into Claude Code every single 10 minutes. And it's just going to run. We don't need to touch it. You can see that cron create being created there.

    You can literally write anything in natural language, and it will set up and use the skills that you want it to use too. So "/loop everyday check my YouTube for new videos and then run my content repurposing skill." Inside my skills library, we've got the marketing content repurposing skill to create a newsletter and a tweet, and it's gone and created that too. You can see that that's put all the scheduled tasks in this scheduled tasks document.

    Or you can say something like "in one minute remind me to talk about one-off reminders." What that's going to do is go and set up a single occurrence reminder. And in a minute, it's going to give us that notification. After the longest minute of my life, we've got the reminder up here: "Claude Code reminder. Talk about one-off reminders."

    The point is reminders are one-off. Loops are recurring tasks. Both get created the same way under the hood using cron jobs. If we go to the Claude Code docs, we can see that there's three tools that it uses: cron create, cron list, and cron delete. They're exactly as they sound.

    Basically cron is a notice for some sort of command to run at a given time. That's exactly what it stands for: Command Run On Notice. Just for quick reference, it's done using these cron expressions, which effectively are different symbols for different time values. That's it.

    Now let's talk about the limitations. Loops expire after 3 days. That is probably the biggest limitation so far, and that's a safety thing so you don't accidentally have 20 loops running forever. They also only live in your current session. If you close the terminal, they're completely gone. And they don't even catch up. If the session was closed when a loop was supposed to fire, it just disappears and never runs again.

    Think of loops as "help me right now on this project" — like watch my inbox for important emails, track changes across a sprint, and anything where you need Claude checking in for the next few hours or days. It's not long-term.

    Scheduled Tasks: Long-Term Automation

    Short: What if you do need something that runs every week or every morning? That's where scheduled tasks, the second update, comes in. Loops are great for short bursts, but if you want Claude Code to do something every single morning or every Monday, for example, you need something more permanent. Many of you will think of this as workflows, much like you build in a tool like n8n.

    Scheduled tasks are exactly that. You set up a task with a prompt, choose the model, choose the schedule — whether it's daily, weekly, hourly, whatever — and it just runs. Every time it fires, it's going to start a fresh instance, unlike /loops, which sits in one instance. It's going to read your project files, run through the skills it needs, and then run the command. It's going to stop the session once it's completed.

    Something I would use, for example, is "/schedule every day, check my YouTube for new videos, and then run my content repurposing skill to create a newsletter and tweet." I have to run this in the desktop app either in code using the schedule, or I can run this directly in CoWork by hitting new tasks on scheduled tasks.

    Here you can see we add a name, we add a description, we add a prompt — and here we don't need the slash schedule. We say "repurpose videos," "repurposes new YouTube videos," and then tell it what to do. We can obviously then add the daily frequency of 9:00 a.m., choose the model, select the folder to work in if we want, and hit save. That's going to create a scheduled task that's going to run every single day.

    If we go to customize, we go to our skills, it's going to leverage — because of the descriptions that we've used in our skills — the YouTube tool to actually pull information from my YouTube and then the marketing content repurposing skill with all of the references to different copywriting techniques and splitting it out per platform from this skill too.

    Now all of this sounds really good, right? But the key limitation right now is, as you've seen, that you can only use this inside the desktop app, inside the Claude Code or Claude CoWork interface — not in the terminal or not in VS Code extensions. But knowing how fast Anthropic is shipping at the moment, I'd expect that to change very soon.

    The other thing: your computer has to be on and the app has to be open. But unlike loops, if you do miss a run, it does actually catch up and run the missed task when you reopen it. So it doesn't just disappear. So that's quite nice.

    Loops for right now, scheduled tasks for every single day. Between the two, Claude Code will now never stop working.

    Google Workspace Integration: Full Ecosystem Access

    Short: Claude Code can run on a schedule and check in on things automatically. But there's one area that's been a massive, massive gap in Claude Code skills up until now, and that is getting it to actually work with your Google Workspace.

    If you have ever tried to get Claude Code to create a Google Doc or manage your files through the built-in skills, you'll know this pain. For some reason, you can only manage emails and your calendar. But so many of us work in the Google Workspace that it needed a critical update to interact with Google Drive. You can see Claude Code has no good MCP server for Google Drive and can't interact with the Claude Code desktop. Rich here just wanted one that could set up an MCP server to list, read, write, and modify Google Docs directly and have it work with minimal setup, not having to go through the APIs.

    Now, this one's not strictly a Claude Code update, but it's still really important. Good news: Google just released an open-source Workspace command-line interface or CLI that changes everything. It's one tool that gives Claude Code access to your entire Google ecosystem — thinking Drive, Gmail, Calendar, Docs, Sheets, Slides, absolutely everything. And it comes with over 100 built-in recipes that Google have already set up. It makes it super simple to do anything like create a document.

    I know this says "this is not an officially supported Google product," but this is just because it's in beta phase. The setup is super simple. You can either install it directly in your terminal or literally just take it directly to Claude Code and ask it to set up this Google command-line interface. It's going to guide you through the whole setup process.

    The way it creates documents is going to be completely different. If you've ever created documents through a tool like n8n, you'll have seen that the documents come out with raw markdown formatting and you actually have to make API calls to make it look good. This in contrast is running bash commands that talk directly to Google. So you get properly formatted docs with headers, images, links, everything. It's the full package that is being delivered here.

    Let's use our content repurposing system to produce a markdown formatted Google Doc now to show you. And there you go. You can see that we've got a properly formatted markdown document that's just been created with one install: the Google command-line interface.

    Skills 2.0: Built-In Testing and Evaluation

    Short: We've got loops, we've done scheduled tasks, and now we have Google Workspace access. But the update I'm most excited about is the one that makes every skill you build dramatically better, much, much quicker.

    Right, let's move on to Skills 2.0. If you've been building skills inside Claude Code so far, you would build a skill, you run it, the output's okay but not brilliant. So you tweak it, you run it again, and this continues until you get something passable. You're always iterating on it, and you don't really know what's working and what isn't.

    But Skills 2.0 is designed to fix that with built-in evaluation and testing. This is a much-needed update. Anthropic actually went and updated the skill creator skill — their own skill creator skill — to include proper evals. What that means in plain English is you can now automatically test your skills against specific criteria and get scored results back. Not just simple "did it work or not" — you get actual grades on things that matter to you.

    Here's how I'd recommend using it, because if you just say "run some tests," you'll get generic results that aren't very useful. The key is being specific about what you're actually trying to optimize the test for.

    Let's walk through a process. First, you're going to build your skill with a solid framework. You can actually use the skill creator skill, as it sounds, to do that. You're going to give it a clear name, a trigger description that's really descriptive, and define the goal of it. You're going to specify which tools or connectors it needs. You're going to list your reference files — your brand voice, your ICP — and actually connect those inside the skill.md description. Then you're going to lay out step by step the process you want it to follow. That is ultimately the skill.md.

    You're going to include things like where you want the human-in-the-loop checkpoints and where the output should actually be saved. That will get you to a good first version. But as you know, they're never finished on the first try. Most good skills go through many iterations before they really start to solve your problems. That's exactly what evals are designed for: to speed up that learning cycle.

    Let's use my marketing copywriting skill to demo this. Instead of saying "run some tests on my copywriting skill," you say something like this: "Run a new test optimized for making sure my copy follows the persuasive techniques listed in my persuasion toolkit reference file."

    You can see in the marketing copywriting skill, we've got the persuasion toolkit.md as a reference file. The criteria are XYZ. So it might be: does it always use the reference file firstly? Does it use curiosity and open loops which are actually listed inside the persuasion toolkit? And how often is it using proof or founder-led stories, which is another thing that's listed in that persuasion toolkit? So is it using that properly is basically what we're asking it to evaluate here.

    Then what we're doing is testing it on this: we're testing it on writing landing page copy for my school community. We're going to get it to do it five times and test it against that exact criteria. What it's going to do is go out and actually do the test using the skill and come back with a proper framework of evaluation of how it performed.

    It's successfully loaded that meta skill creator skill, which is just a renamed and improved version of the skill creator skill through Anthropic, and it's starting to run the evaluation test now. You'll notice we're not trying to optimize for like six things at once, because that would be way too many moving parts. What we're doing is picking one to three things. We're testing it, we're then improving it, and then moving on to the next one.

    Let's have a look at what they come back with. By the way, the eval runs multiple variations in parallel using sub-agents. So it happens pretty quickly, and it's going to score each one against your criteria and give you a structured report — an HTML report that we can actually go through.

    You can see five agents launch: grade copywriting run 1, 2, 3, 4, and 5. It's going to come back with a really nice click-through that we can see the criteria and actually improve our skill.

    This is where the brilliance comes in, because actually this skill creator evaluations will spin this up into a web page that we can go and look at all of the landing page outputs. We have the prompt that was originally put in, and we can obviously test it also with and without skills — and we'll come to that in a moment.

    It's got the outputs, and we can flick through between the different outputs and even go down to this formal grades section here. It's saying it's not used curiosity gaps — at least two instances where a result or discovery is teased without immediately revealing it, creating an information gap the reader needed to close. It's gone in to say the copy lacks genuine curiosity gaps that sustain across multiple sentences.

    What this means is actually, if we wanted curiosity gaps to actually be abided by, then we need to improve either the skill.md file which is referencing that information in our persuasion toolkit or place more emphasis on that. You can see that actually this is pretty poor: 50% on this run, six pass, six failed of 12 there. But it gives us a really good idea with actual examples of how to improve our skills.

    This is a skill that I've whacked up quickly yesterday, and you can see that it needs improvements if those curiosity gaps and open loops were really important for my copywriting.

    We can then also go to the benchmark. We can see how long each run took to take, how many tokens plus or minus each run took to take. We've obviously run five here with the skill and none without the skill here to compare. We can see for each run the evaluation breakdown of what passed and what failed for each run here.

    Thankfully, it's also assessed it against other criteria. These were always passed. So in five out of five times, it makes the pain concrete, it digs to the emotional benefit. But sometimes we don't have that founder story section. So maybe we're not giving it enough founder context and story context in our initial brand context, or maybe it's just not inferring it correctly.

    What we would do is actually provide specific feedback for each of the outputs here and then copy that back into Claude Code like "more founder-led stories." Obviously that's pretty shallow feedback right now. We take that back to Claude Code and we would say that, and it would now start to evaluate and improve the skill and tell us what has changed from that skill.

    It's got it applying the fixes now with extra emphasis on founder stories, then rerunning, let me edit the skill.md. So it's going to go and edit, based on its information it's found so far, the skill.md file.

    Now what we saw was we can actually test things, A/B test things with or without the skill. Let's split the terminal here. Let's reopen up Claude, and we can run a simple A/B test where we wanted to, for example, test: is the skill actually improving the output versus not improving the output? Can we create a leaner version of the skill? Can we strip out certain reference files that aren't needed? And we effectively get that side-by-side comparison of the results of one versus the results of the other.

    But it's also important here to also specify the criteria that we're marking against. We want to know, you know, which one takes longer, which one has fewer tokens, and which one ticks the criteria in the persuasion toolkit or ticks a classic persuasion framework the most.

    We would effectively ask it something like "create landing page copy as an A/B test with and without the copywriting skill." That would run the same set of evaluations. We give it a bunch of criteria to mark against, and we'd then be able to see: is the skill actually adding to the quality or reducing or taking away from the quality and just costing us tokens?

    Ultimately this is about improving our skills in a quicker way than just running our skills in production and actually trying to work out what is and isn't working. The eval function is going to do that for us and make us learn in quicker loops. Like, for example, does the marketing copywriting need these three reference files, or can we just work as well with one of them?

    You can see it's now rewriting that copywriting skill so that we can actually have an improved result, and it's running the evaluations again so we can see the result of the new test, which is super powerful.

    Basically stop guessing whether your skills actually work. Test them with the Anthropic skill creator skill. Score them, improve them, and that's how you go from a skill that kind of works to one that's going to nail it most of the time — or 9 out of 10. When you've got that 9 out of 10, you can stop iterating on it.

    Conclusion and System Integration

    Short: There you have it: four updates that genuinely change what's possible with Claude Code. Loops for short-term automation, scheduled tasks for daily and weekly routines, Google Workspace for further access to your Google ecosystem, and Skills 2.0 for building skills that actually are going to get better over time.


    Polished transcript of Simon Scrapes. All views are those of the original speakers. Watch on YouTube ↗
    Published by @martymcfly
    More from @martymcfly
    Summary