Podcast transcripts, polished for reading

NEW ChatGPT 5.2 Complete Breakdown: Tested on Excel, PowerPoint, Massive Data Sets, and More | AI News & Strategy Daily | Nate B Jones Transcript

Polished transcript · AI News & Strategy Daily | Nate B Jones · 12 Dec 2025 · 14m · @maverick

Nate B Jones breaks down ChatGPT 5.2's agentic capabilities and what they mean for how we work

A solo analysis of ChatGPT 5.2 by Nate B Jones of AI News & Strategy Daily, covering its agentic capabilities, comparisons with competing models, and the new skills required to use it effectively.

Summary

Nate B Jones argues that ChatGPT 5.2 is not the incremental upgrade it has been positioned as, but rather a meaningful leap forward — primarily because it is agentic by default and capable of running complex, long-duration tasks against massive datasets. He demonstrates that the model can ingest 10,000-row datasets, produce PowerPoints, Word documents, and Excel spreadsheets, and do so with significantly fewer hallucinations than previous versions. He compares it directly against Gemini 3 and Claude Opus 4.5, finding that while all three have strong reasoning, ChatGPT 5.2's ergonomics and data ingestion capacity give it a practical edge. His central argument is that the defining skill for 2026 is no longer execution with AI models but delegation to them — and that most people and teams are not yet ready for that shift.

Key Takeaways

  • ChatGPT 5.2 is agentic by default, meaning it can run for 20–40 minutes on a task without interruption, processing large and varied datasets and returning finished outputs like PowerPoints, Word docs, and Excel spreadsheets — a capability that sets it apart from earlier models.
  • The model reportedly reduces hallucinations by approximately 38%, and Nate argues this improvement is visible in practice: outputs are more coherent, more narratively structured, and more trustworthy than those from 5.1 or 5.0.
  • Gemini 3 has a significant ergonomics problem — despite strong underlying intelligence, it cannot accept PowerPoint, Excel, or CSV uploads in its consumer-facing interfaces, making it impractical for the kind of complex, data-heavy work that ChatGPT 5.2 handles easily.
  • Claude Opus 4.5 is a genuine competitor with solid ergonomics and good artifact output, but it uses tools rather than reasoning to accomplish long-running tasks — a fundamentally different architecture. Nate gives Opus a slight aesthetic edge on PowerPoint design but rates ChatGPT 5.2 higher overall due to its data capacity.
  • The critical new skill is delegation, not execution — knowing how to frame a problem, specify the desired output, explain the input data, and hand the task off to a long-running agent. This is described as the defining professional skill for 2026.
  • Problem framing now has higher stakes — because the model may run for 30–40 minutes on a task, a poorly scoped prompt wastes significant time. Getting the directions right upfront is now more important than ever.
  • ChatGPT 5.2 thinking mode is distinct from instant mode and should be understood as a different tool — closer to a broad Swiss Army knife than the focused scalpel of Deep Research, offering more control over output type and analysis style.
  • Narrative emerges as a property of coherence — when a model hallucinates less and processes data more accurately, it becomes capable of identifying and articulating a story within data that the user may not have seen themselves, which Nate identifies as a practically valuable emergent capability.
  • FULL TRANSCRIPT

    ChatGPT 5.2 as a signal of where AI is headed in 2026

    Nate B Jones: ChatGPT 5.2 has time-traveled back to see us here. I am convinced that this is a model that shows us what the future looks like for 2026. It's not an incremental upgrade. I know it's positioned that way, but it's actually got some capabilities that I haven't seen in other models, and I want to lay out what they are so you can figure out for yourself whether the model is right for you.

    First and foremost, this model is agentic by default. If you think about models on a range of how long they can run and execute tasks, this is the first generally available model where it's very easy to get it to do a tremendous amount of work on a huge bucket of inputs — like a dataset with thousands of rows. I tried it with a dataset with 10,000 rows. It can do all of that: compute against it, develop insights, come back with a PowerPoint, come back with a doc, come back with an Excel spreadsheet — and it actually works. That means it's accurate. It's coherent. It's cogent. It's thoughtful. It's able to craft an executive narrative. The PowerPoint is not nearly as problematic as it was in 5.1 and 5.0. The PowerPoint artifacts actually work now. It's wonderful.

    The new skill: learning to delegate to a long-running agent

    But this creates a skill problem for us, doesn't it? What we have to figure out is how do we define work that is ready to be delegated for that period of time. That's a new skill for a lot of us. For many of us, we have been trying to figure out how to make these models help us do our work faster all year long, and that's been most of the conversation. The models keep getting better and we have to keep scaling up. In this situation, the skill we need to learn — whether we're technical or non-technical — is how do we define a piece of work correctly so that we can assign it to a long-running agent. That is what feels like 2026 about ChatGPT 5.2. That's what feels novel, new, and super interesting.

    If you can't define that work, you are going to be behind people who can define it well and come out with a fully-fledged analysis from a deep dataset or a deep problem in the code — and then get an answer they can use and run with, one that would normally have taken them hours. When I take — and I'm not kidding — 20, 30, 40 minutes on a ChatGPT 5.2 task, which I did today, it's really good. It's better than work that would have taken me four or five hours to do. So it's not just about whether it can save me 20 minutes. It's understanding that the model can do in 20 minutes what would have taken someone six or eight hours to do — and knowing how to understand that block of work and give it to the model.

    Now, you might think: if it can do six or eight hours of work, can it just do my job? The answer is no. It needs clear scope. When I talk about the skill to delegate to the model, the first thing is being able to define what output you want — a scoped output that matters. If you want a PowerPoint deck, it can do that, but you have to define what you want. If you want a Word doc, it can do that. If you want an Excel, it can do that. Specify. Be clear about what you want.

    You also need to be really clear about what you need from the inputs — especially if you're going to use that large context window and put a bunch of material in. Please explain to the model what is in the box and what you want the model to do with it. Because if you don't, the model is going to fill in its best guess and try to make it intelligible as best it can, and you may or may not get what you want. And that has higher stakes now. One of the big things that shifted in the last six months is that we are no longer in a world where instant responses are the best a model can do. The best a model can do is often longer-running. So if you're in a world where the model can take a while to come back with a response, you'd better get it right. You'd better be correct in your problem framing.

    That's not just an executive skill set anymore. That's everybody's skill set. All of us need to learn more about framing problems and chunking problems into scopes of work that can fit with a model that is truly agentic. And the reason I'm emphasizing that here is because ChatGPT 5.2 is so widely distributed. Everybody's going to get it because everybody has ChatGPT. So we all need to learn this.

    Comparing ChatGPT 5.2 to Gemini 3 and Claude Opus 4.5

    Now, you might be wondering how this compares to some of the other models out there. I want to give you some very specific comparison notes from early testing, because I did a cross-analysis where I gave the same assignment to different models to see what the quality would look like. I tested against Gemini 3. I tested against Claude Opus 4.5. I tested on ChatGPT 5.1 as well, just to get a sense of the difference versus 5.2. I think I'm getting a real clear picture of where these different models stack up.

    One of the things that is standing out to me is that the ergonomics of the model matter a lot. By ergonomics, I mean how does the full environment around the model feel — comfortable, like a good ergonomic chair — so you can use it for useful work. That's not just comfort. That's actually value.

    Specifically, Gemini 3 has really poor user ergonomics right now. They have embedded Gemini 3 inside Google products, and you can access Gemini 3 in the developer studio and in the mobile app. But in none of those places is it easy to throw a bunch of data, throw a bunch of docs into the model, and say: please come out with a fully finished output. That is just not the product that Google has built. So even if the brain power is there to do meaningful work against these artifacts — to analyze them and come back with a fully featured output — you can't get to it. I could not upload a PowerPoint to Gemini 3. I could not upload an Excel or a CSV. It's just not good. You have to have the ability to put a lot of data in if you want to do complex work, and it's a problem if you can't do that.

    I love Gemini 3. I did a great review on it. I still use it. I love their image generator. It's a smart model. I use it for thinking quite a bit. But the ergonomics are a real issue, and they really pop out when you compare it to ChatGPT 5.2, because ChatGPT 5.2 will take anything. You can throw anything in there — a screenshot, a CSV, a doc, a PowerPoint — and it will just chew on it all and process it and come out with something useful. That's really, really helpful.

    One of the things that really stood out as a difference in my testing is that the ability to intelligently and coherently process this data — with fewer hallucinations — is way up. That showed up in their benchmarks too. They saw something like 38% fewer hallucinations, and it just pops. You can see the coherence.

    Comparing it to Opus 4.5 is interesting because the ergonomics in Opus 4.5 are also quite solid. You can throw in a wide variety of input documents. I like the way Opus 4.5 is able to craft effective output artifacts, just like ChatGPT 5.2. So if I were to look for a difference between the two, the first thing I want to call out is that the way the models are architected is very different. ChatGPT 5.2 — especially in thinking mode, which is a very different mode from instant mode — is a long-running, thoughtful, intentional model. It takes a while to respond. It does very thorough work, and these days it now does artifacts well too. There's not really that gap on PowerPoint functionality anymore.

    Opus uses tools instead of reasoning. Opus will work for a while, but it's using tools as a non-reasoning model to get that work done. It's a very different approach. I like the aesthetics of the PowerPoint that Opus 4.5 produces slightly better. The functionality is about the same — from a functional PowerPoint narrative perspective, it's about the same. And critically, the thing that gives ChatGPT 5.2 an edge is that it can take so much data to solve your problems.

    Why data volume is the real differentiator

    That's why I started this conversation by saying: pay attention to how long these agents can work. Because if you're going to give an agent a meaningful task, that only really works if you trust it with a ton of data — if you give it a lot of data to work with and ask it to handle a complex task. Otherwise, even in thinking mode, it won't take that long, and you won't have solved that meaningful a problem.

    The thing we need to shift toward is a world where we recognize that increasingly the models have a better understanding across larger swaths of data than we do. Maybe it's a set of customer service tickets that we need to analyze. Maybe it's hundreds of Twitter responses to a question we had. Maybe it's Stripe transaction data. Maybe it's a big Excel spreadsheet of customer issues. You get the idea — anything that has that sort of very large, variegated data all in one place. You could have customer tickets in one hand, transcripts and recordings in another. The data can be quite variegated. With a big enough context window, you can throw it all in there and ask it to make sense of it. And it does. It's able to translate it into something useful.

    Narrative as an emergent property of coherence

    This is a little bit of an intangible, but one of the things that comes out when you have a model that is strong at coherence, that reduces hallucinations, and that has the tools to build something like a PowerPoint well, is the ability to build narrative — and that comes as an emergent property. What I noticed is that it's able to take data that I don't necessarily have a clear story for and pull it in and say: there's a story here, here's the overall story, and here's why I know that. And you can check it and prove it — because of course you do. You have to go in and verify that it actually works.

    The defining skill for 2026: delegation over execution

    So if you're looking at 2026 and asking yourself what skills you need to thrive, how do you build this into your teams — I would say the number one skill you're going to need in ChatGPT 5.2 and in the other models that follow, not just from ChatGPT but from Gemini, from xAI, from Anthropic — you're going to see more agentic models, and your number one skill needs to be: grow my ability to delegate.

    We are moving from a world where execution with models was the story for 2025. Delegation to models is going to be the story of 2026. We're not ready. We're not ready with the data side. We're not ready with the skill side. We don't know how to frame problems.

    The first thing I did when I started getting into 5.2 and seeing what it could do was go over to 5.2 and ask it to help me think through prompting this model differently — because we have to think about prompting not as "give me a response now," but as "let me give you a lot of stuff and then go away and think about it." Eventually we're going to get to a world where we have more interaction patterns with running agents and you can interrupt the agent. We're starting to see hints of that. We'll see more of that in 2026. But the skill for now is to really intentionally aim the model in the direction you want it to go, make sure you have the right material, and then give it time to work. Let it work for a while.

    It is not unusual to see a model like 5.2 work for 20, 30, 40 minutes. And it's not like Deep Research, because Deep Research comes back and just gives you a web report — very well written, can be 50 pages. ChatGPT 5.2 thinking will come back in a similar amount of time, but it will give you much more control over what you get. You can define the output type you want. You can define the kind of analysis you want. It's like a much broader Swiss Army knife versus the scalpel that is Deep Research.

    Where to fit ChatGPT 5.2 thinking mode into your workflow

    So if you're wondering where to put this into your workflow, I would say ChatGPT 5.2 thinking is an agentic workflow executor that is almost more powerful than we're ready for. If you know how to delegate well, it is going to eat work for you. You want to analyze a P&L — let it take the first pass. You want to analyze an acquisition — let it do that. You want to analyze your investments or your personal savings and budget — let it do that. This thing loves to solve problems.

    The rate limiter for us — the real question — is: do we have the taste to find the right problems to solve? Can we locate the data for it? Can we throw the data in and then give it clear enough directions about the output and the kind of analysis it needs to run in order to get successful outcomes?

    Because the stakes are higher now. If you're running ChatGPT 5.2 for 20, 30, 40 minutes and you didn't give it the right directions, your feedback loop is slow. You're going to be thinking: now I have to redo it, and it's going to be another hour out of my day. So our prompting skills are now higher leverage because it's so important.

    The key thing I want to leave you with — beyond prompting — is that we need the soft skills to delegate better, to understand those problem frames. I believe that is the key skill for 2026. And I think that is what 5.2 shows us in a way that no other model does. It will eat entire workflows because it is so good at correct, coherent, long-running agentic execution.

    I think they kind of undersold it as a 0.1 upgrade. I think it's bigger than that. But you tell me — test it out and let me know what you think.


    Polished transcript of AI News & Strategy Daily | Nate B Jones. All views are those of the original speakers. Watch on YouTube ↗
    Published by @maverick
    More from AI News & Strategy Daily | Nate B Jones
    More from @maverick
    Summary