Episode 002 · May 21, 2026 · 01:07:42

Claude-maxxing: Burning $10K in tokens for only $50 with a custom software factory

Dan Gerlanc

Podcast Host

BlueSky Threads X LinkedIn

Ian Stokes-Rees

Founder & CEO, PNI AI Studio

Website

Learn how Ian runs six to ten Claude Code agents as a disciplined Scrum team with up to 100 PRs and billions of tokens in a single day. Hitting Claude Max 20x limits, token-management tricks, what to trust (and not trust) from agentic engineering.

0:00 −01:07:42

Show notes

Dan and Ian Stokes-Rees, founder and CEO of PNI AI Studio, open by discussing the thesis of Ian’s company: an opinionated stack of open-source tools wrapped in agentic AI so business analysts, managers, and finance teams can get the capabilities of a senior data scientist without learning Python, SQL, or R. Ian’s primary target is financial services, where an estimated 200+ million weekly Excel users still run human-driven, tacit-knowledge processes. He frames a second opportunity, capturing the “AI exhaust” coming out of those workflows, as the seed for a follow-on product.

The conversation turns to how Ian actually builds his product. Ian walks through a three-phase evolution: Cursor as a coding assistant, prompt-based Claude Code generation, and finally a full agentic team modeled on Steve Yegge’s “Gas Town” post. Today he runs six to ten Claude Code agents in named roles. Xavier and Yasmin are Agile Process Managers, Anne is the Principal Architect. Now add in software engineers, QA engineers, a test engineer, and a release manager.

The agents’ operating manual is a roughly 5,000-line AGENTS.md tree spread across about 45 markdown files and served via MkDocs. The Kanban lives in GitHub Projects, milestones serve as sprints, story points and labels drive the workflow, and a “kaizen accumulator” task captures learnings each sprint that get translated into process changes at the start of the next one.

Next up, diving into token-maxxing. Ian explains why he keeps hitting Claude Max 20x weekly limits on day three of a sprint — five software engineering agents plus two QA agents burning tokens in parallel — and the management tricks he’s adopted: Caveman to enforce terse prompts, templated processes, a catalog of deterministic scripts behind self-documenting skills, pre-commit hooks, and roughly a dozen CI gates that run Claude and Codex reviews against PR templates.

Still, not everything is perfect in agent-land. Ian describes his agents as “solid second quartile” engineers. They’re fast, pleasant, and (currently) inexpensive, but wrong in meaningful ways on one PR in five. Vibe coding works for prototypes and small reports, but serious systems still need human-driven design thinking, separation of concerns, and testing discipline.

Perhaps the current moment is an “interregnum” between 25 years of established software practice and an agent-native future. Could this one day be a software factory with human “forepersons” running follow-the-sun shifts over agents that never sleep? The episode closes with a warning about “AI brain fry” that comes from work products arriving ten times faster than humans produce them.

Chapters

From this episode — Ian Stokes-Rees

I never trust that they've done the thing that they've said they've done.

My AI agents are very solid, second quartile engineers, designers, managers. They're not top decile. I don't even think they're top quartile. But they're extremely fast, very pleasant to work with, very professional, and very inexpensive.

Their work products come out about ten times as fast. So they need your attention about ten times as frequently.

I had a hundred PR day. One day, one hundred PRs merged. That was a sixteen hour work day, and I looked at every one of those and pressed the merge button only when it was green.

For prototypes, no problem. For serious systems, you've got to work with your agents — design thinking, separation of concerns, testing, documentation, process and procedures.

I don't view my agents as an agent swarm. An agent swarm is like a startup where you've got a charismatic founder who says, 'You're all smart, go do smart stuff.' I haven't seen that work.

Mentioned

PNI AI Studio: Ian's current company building enterprise AI data analytics
Anaconda: Python distribution company, formerly Continuum Analytics, where Ian worked
Steve Yegge's Gas Town: Blog post Ian credits as inspiration for his agentic engineering setup
Paperclip: Agent management framework Ian is migrating his team onto
GitHub Projects: Kanban and milestone tracking Ian uses for sprint management
MkDocs: Static site generator Ian uses to organize the agents.md document tree
Caveman: Claude Code skill that cuts tokens by making agents talk telegraphically
When Using AI Leads to "Brain Fry" (HBR): HBR article from BCG researchers on cognitive load of supervising AI agents
Jensen Huang GTC 2026 keynote: Referenced for the AI factory framing Ian applies to software
Kaizen: Continuous-improvement concept Ian operationalizes via a per-sprint accumulator task
Extreme Programming (XP): Pre-agile methodology Ian draws on for pair programming and Kanban practices

Transcript

00:12 Dan

I’m Dan Gerlanc and welcome to Agents and Engineers, the podcast about Agentic AI and software engineering. Today we’re joined by Ian Stokes Reese. He’s currently the founder and CEO at PNI AI Studio. Prior to that, he was at BCG for eight years at Anaconda.

Formerly continuum analytics for five years, and he has a PhD in computational particle physics from Oxford University. All all easy stuff there. So thanks for joining the show today, Ian. Great to have you.

00:54 Ian Stokes-Rees

Thanks, Dan. I’m happy to be here. It’s ⁓ great to talk about this. Pretty exciting time.

00:59 Dan

Sure. So tell me a little bit about your new company, what you’re working on, what’s the goal?

01:09 Ian Stokes-Rees

Yeah, so in a in a sentence, the focus of what I’m building at PNIA Studio is a product, a platform and a business focused on enterprise AI data analytics. Space I’ve been working in for ⁓ my whole career, and that’s what I’m really focused on right now.

01:22 Dan

WHAT

And what it what is this distinguishing feature or challenge you’re addressing with this that you don’t think’s currently addressed in the market?

01:47 Ian Stokes-Rees

Yeah, great question. So having spent eight years at BCG and before that at Anaconda,

01:55 Dan

Whoa!

01:56 Ian Stokes-Rees

There’s such a significant stack of tooling and process and capability that needs to be in place to be able to do good data engineering, data analytics, data reporting. There are proprietary kind of walled garden tools, platform systems that let you do that, or you got to build it yourself. I have a belief.

02:09 Dan

Whoa!

02:24 Ian Stokes-Rees

That in our new AI era, there’s the possibility of pulling together an opinionated stack of open source tools, integrating them together, providing agentic AI wrappers for orchestration, and then exposing those to business analysts, data analysts, managers to empower them to have the capabilities of an advanced, experienced data scientist.

in a directly accessible AI tool. And so that’s the like that’s the key vision and the key dream is to really like pull together those capabilities in that opinionated way. What we’d say in BCG would be an 80-20 approach for the per effort to the performance and make those ⁓ make those decisions for the average user, ⁓ for the average operator that just makes it easier to go from beyond Excel and

pivot tables and being an Excel jockey ⁓ and also not land you in the space of having to learn a lot of like Python or SQL or R and kind of up level it as we’ve all been learning in the past year to more natural language chat based interfaces with work that’s delegated to AI agents, but still can address ⁓ explainability, repeatability, provenance, these cape these aspects. And so it’s a lot of those pieces ⁓ built into what I’m

What I’m trying to do today.

03:55 Dan

Is there a specific industry you’re envisioning this for, or is it more general, any kind of data intensive place that needs this kind of process?

04:10 Ian Stokes-Rees

Yeah, so so

from my from my experience, one of the big domains where this is very common is in financial services. You’d be amazed at how many ⁓ banks and financial services companies, and then also within organizations that are doing a lot of ⁓ accounting and finance departments, that sort of thing, how much work is still done by people at Excel? They end up being human based processes with tacit knowledge, with ⁓

messy semi-structured data and Excel spreadsheets. And so that’s a domain where you know there’s estimated that there’s over 200 million people around the world who use Excel on an average week. ⁓ And today there’s the possibility to empower those people with AI augmentation beyond the level of just Clot and Excel to be able to bring to them ⁓ capabilities around

data interpretation, data merging, data cleaning, data analysis and reporting. And so this is really like my this is really like the starting point for for what I’m focused on. Now I probably we’re gonna go in some different directions here for the overall conversation. But I think there’s beyond that, ⁓ as I’m working with my you know small early stage team, we’re also discovering that the like a new aspect of data in the space is the AI exhaust that’s coming out of

05:18 Dan

Yeah. Yep.

05:35 Ian Stokes-Rees

individuals and organizations that become part of how do organizations work and how do they how do you understand in this like more narrowly enterprise data space ⁓ how do we actually begin to capture some of that tacit knowledge and those processes that are human-driven processes. Once you bring an AI agent into the partnership with an individual, you’re able to start to learn how are things done and how can we automate and how can we accelerate and how can we establish some some best practice.

So there’s a lot of concepts in there, but again, end of the day for me, it comes down to enterprise data and AI capabilities to empower individuals and organizations.

06:17 Dan

sense and I I started my career actually in quant finance. So definitely understand porting things from Excel to the R data stack at the time. but that that is the world of a lot of data analysis out there is Excel and you can do a lot, but at some point it can also get a little dicey.

06:48 Ian Stokes-Rees

Yep, absolutely. ⁓ and I and I’d actually say that with Cloud and Excel, we then bring in some of the challenges of what happens when you start to effectively vibe code your Excel analytics within an AI partner. ⁓ I use I use Cloud and Excel. I know there are a number of other tools that give you that kind of intersection of

Data plus AI tooling. ⁓ But what I continue to see is this gap between reproducibility and explainability, as well as the ability to tap into in a repeatable way ⁓ some infrastructure around knowledge management, around context management for particular tasks, around not repeatability of a particular analysis that’s just been done, but if it’s a

⁓ something that happens say on a monthly or a quarterly basis, how do you begin to systematize and capture some of those things? So this is an aspect of the space that that ⁓ I’m trying to work in. And I I definitely also have to say that ⁓ while I’m building AI agents for use in the enterprise data space, I am building those tools with AI agents. And so it’s given me a lot of experience of understanding like the personalities, the character and capability of ⁓ of AI agents ⁓ on

on both the build side of my product as well as the capability side that I’m I’m ⁓ exposing through the the product and platform that is being created.

08:26 Dan

Yeah, so spe speaking to your building with the AI agents, and I’ve I’ve seen you post about maxing out your Clawed twenty X subs subscription I I I don’t know too many people who are hitting those limits. So how how are you I mean this isn’t a world we were in a year ago. How are you get what

08:35 Ian Stokes-Rees

Yes. Yeah.

Yeah.

That’s right.

08:53 Dan

kind of is that workflow look like for you? How how are you utilizing it so efficiently I’d say

09:03 Ian Stokes-Rees

So what I so so the key thing I would say is that ⁓ my first phase in this space was leveraging AI tools for engineering as a partner to ⁓ assist me as I was the primary like hands on keyboard ⁓ engineer developer ⁓ the last year, of thing, right?

09:27 Dan

And what timeline was would that have been?

09:30 Ian Stokes-Rees

Using cursor, I still use cursor, I think cursor’s great. But at a point where there was like some fraction of the code that was in place that I had like typed myself. ⁓ That then moved into increasing levels of prompt-based code generation. But I’m prompting and I’m watching code being generated, and I’m then you know reviewing that work. The shift that really occurred was as Claude Code got to a point.

I use Codex as well, but predominantly I use Cloud Code, so I can maybe come back to that later if ⁓ you know the conversation takes us there. ⁓ as I got to the point of trying to step up the level of engagement that I had with these with my coding agents, the past 10 years of my life has not been mostly hands-on keyboard. Last 10 years of my life has been managing engineering teams or managing a product team with multiple engineering teams.

10:04 Dan

Yeah.

10:28 Ian Stokes-Rees

And my response was at a high level of the objectives that we’re trying to aim. What’s the value delivered to the client? What are the key objectives for the next release that we’re trying to produce? And so I began thinking a lot more seriously about how that would look in quad in with quad code. ⁓ early January, I read Steve Yeege’s Gas Town that probably a lot of people listening to this will have heard of. If they haven’t heard of it, you should Google it. If you’ve heard of it and haven’t read it, you should read it. And I read that and I was like, well, this is incredible.

Because this is a fully agentic, it’s called Gast Town because it’s there’s all these different things that are happening in Gas Town that are building the product. And at that time I thought, well, that’s incredible, but that’s so far beyond anything that I could achieve. And I and then within about two months, what I found is that I’d built my own version of Gas Town. As I tried to up level what I was doing, I took

20 years of engineering experience. It’s even more than 20 years of engineering experience. I was doing agile programming when we still called it XP, extreme programming, ⁓ with pair programming and coding teams and a Kanban and having a sprint process and sizing. And I said, you know, I know how that pre-AI engineering world works, and I’m pretty good at it, pretty experienced, pretty good at it, and I know how to manage a team that way. If I set up my AI agents in that system,

I think I could make that work. And so that was really then the that was then the big, you know, for me the big transition. So I think that I said there were there were three phases. The first phase was cursor and using an AI coding assistant. The second phase was me moving more to claude code, just prompt based, hey, generate code and then I’ll review and I’ll help manage and make make commits. The third phase, that would have been about six months six months ago. And then the third phase was was going from

12:13 Dan

Was that way maybe like six one, six months ago? Yep.

12:21 Ian Stokes-Rees

Gas Town and thinking, well, I could never make that work because that’s too ambitious. To within a few weeks, I’d I’d created my own version of Gas Town, where I now have ⁓ I’ve experimented with agent team sizes ranging from like six agents to about twenty-five agents. I found that I can manage ⁓ six to ten is kind of like the right number.

⁓ and I’ve set up then a team of AI agents. I’ve got a Kanban with tasks, the agents understand their workflow. So my agents MD file ⁓ is the entry point. All my agents, of course, start there with their agents MD file. And when they start working, the agents have some standing state, which gives them a name and then gives them a role. And then they know how to work together on the team. And my Kanban has a state in terms of where we are on the sprint flow.

And so depending on whether we’re in sprint planning, sprint execution, or sprint closing, different roles have different responsibilities. And ⁓ that’s been the process that I’ve now been driving for ⁓ three months and probably five sprints based on like where I am in my business cycle. Sometimes my two week sprints get stretched out to three weeks as I’m I’m the I’m the bottleneck. I’m always the bottleneck. ⁓ but that’s then been my my third phase, the last couple of months. And this is a long way of coming around to like where

Where what does it take to max out your ⁓

Get throttled either on a daily a weekly basis on a MaxX20 plan. And ⁓ start of sprint is when it happens. So it’s after we’ve done sprint planning, after we’ve done task refinement, and when I flip the switch and say to my, I’ve got two agile pro ⁓ agile process managers. You could think of them as scrum masters, but I I call them agile process managers. And I have a lead agile process manager, Xavier. And when I say to Xavier,

I check with him, I’m like, does it look like we’ve got everything planned out for the sprint? We’ve queued up the tasks for the sprint, they’ve all been refined. ⁓ we’ve got them, you know, figured out deduplication and and task sequencing for dependencies and confirm that we’ve gone through all of those gates. I mean, there’s many things that happen in that initial sprint planning phase, but when we say when Xavier confirms that we’re good to go, I say, okay, let’s let let’s set the sprint active.

The sprints active. I then have five software engineering agents. And when I’ve got five software engineering agents and two QA ⁓ agents, they then start pulling tasks off the backlog or off the on-deck, a backlog to on deck, but on deck is this like kind of this Sprint’s Sprint specific backlog. ⁓ And they they’re just humming along. And right now I’m running off of a 16 gig laptop.

The system working with my director of engineering, we’ve got a server-based version using Paperclip that we’re gonna on our next sprint, we’re gonna migrate the whole process to run off of that server. That then allows us to have like multiple people working on this as opposed to just different ⁓ human engineers picking up tasks and then running their own local variations of this for individual blocks of work. ⁓

15:36 Dan

Is paperclip

is that a like managed service?

15:38 Ian Stokes-Rees

Paperclip is an eng yeah, pay

think of paperclip as a variation on like OpenClaw or Hermes, ⁓ targeted for kind of an agent management framework. It’s like ⁓ got a back end and a front end UI thing. So that’s that’s what paperclip is. And that start of sprint window, where there’s a backlog of about forty or fifty tasks for this sprint, ⁓ and maybe only a third of them, not more than half of them.

15:44 Dan

Okay.

16:06 Ian Stokes-Rees

Are blocked by other tasks getting ⁓ merged. ⁓ that’s the moment at which the I can have six, seven ⁓ for the young ones out there, ⁓ coding agents burning tokens pretty fast. And it’s under those circumstances now a couple times. That’s been the window at the start of sprint. Now, interestingly, what happens after that first like day one, day two, maybe into day three, start of sprint.

That’s where I get throttled on day one, throttled on day two, and then somewhere on day three I hit my weekly limits, what is what has occurred now twice. ⁓ it hasn’t been a crisis because I’ve got there’s so much work that requires human in the loop review that I’m then able to like push through other aspects ⁓ of the workflow ⁓ through human in in the loop review. I it’s probably a moment where I should also mention like a key part of this is that these are all being driven through.

terminal sessions. So none of this is these are being throttled because I’ve got a ⁓ you know air quotes unlimited, unlimited up till you hit your throttle fixed price plan ⁓ through the through the Max X20. ⁓ And ⁓ you know I’m not taking I’m not I’m not yeah I’m not trying I’m not trying to I’m not trying to bypass any of the controls on that. I do pay for API tokens for some of my CI processes that can only go through ⁓ like through an API token. But those are

17:14 Dan

Yeah.

So you have your five hour window and weekly

17:35 Ian Stokes-Rees

more like the GitHub action side. So that’s that’s this that’s the story of how and where the ⁓ how and where you hit your hit your limits. I’ve got colleagues who do it through other paths and means where they’ve got like their ⁓ personal ⁓ chief of staff who’s doing ⁓ you know meeting minute reviews and doing email processing and I’ve I was just ⁓ talking to a ⁓ colleague last night who was telling me that they’ve have a path that that

regularly is having them hit their limits. And I was surprised given that they don’t have like the same extent of ⁓ kind of coding agents and round the clock work that I’ve been that I’ve been running. You know, I’ve I’ve had over a hundred hours of agent working hours, billions of tokens in a week, which is quite a few tokens. ⁓ you know, at API rates, it’s getting into five figures, ⁓ five figures, ⁓ if I was paying at API rates.

18:23 Dan

Yeah.

And yeah, I wonder if it’s they’re generating very long documents and maybe the cash token rate ratio to output is much higher. Something like that.

18:40 Ian Stokes-Rees

Yeah, it could be.

That’s that could be the case. So certainly after the after the second time I hit my daily limits, I put more effort into token management, caching strategies, using caveman and cavem for reducing ⁓ prompt input and prompt output. So it’s kind of those shim shim layers. Yes, I don’t do I I yeah, I don’t do the ultra I don’t do the ultra level, I do the normal level of

19:06 Dan

Is that the one that makes it talk like a caveman to reduce token usage? ⁓

19:15 Ian Stokes-Rees

which is still re which is still quite reasonable. It’s just very terse. ⁓ you know, relationally I improve I I I preferred it prior to having Caveman enabled, but I I have seen the like kind of the token burn reductions for a more terse conversation. ⁓ so that’s I’ve I’ve been running that for the last month.

19:34 Dan

And if have you

and have you experimented with using like LSP or different semantic analysis tools to reduce token usage?

19:47 Ian Stokes-Rees

Yeah, I I have only just begun the journey in that space. the main strategies that I’ve employed so far have been templating processes so that agents when they execute repeatable processes don’t figure it out from you know they’re from scratch. Instead they’ve got like a standardized routine. ⁓ building a cap that’s what exactly I was gonna say next, building skills and so the skills then set up.

20:07 Dan

So you will you put that in a skill or something like that?

20:15 Ian Stokes-Rees

you know process flow, like a fixed process flow, that agents then make a decision how and when they’re gonna leverage a skill. And then behind the skills there’s then a catalog of scripts that are run and the scripts are then like you know parameterized but otherwise completely deterministic for executing a particular sequence of operations. So those have been some of the efforts that I’ve put in place. If you watch the logs you see that there are times where the agents will like experiment and they’ll they’ll like

Try things multiple times and they’ll see like they tried a thing and then it didn’t work properly, they tried another thing, still didn’t work properly, and the third attempt, they got it right. I’m fine for that happening like the first time, maybe the second time, but when that happens, you know, five times a day every day or more, ⁓ and I’ve seen that kind of pattern occur ⁓ because I’m looking I’m looking in quite a lot of detail at the logging output of the agent behavior, skimming it, I should say, but I see these patterns of failed operations. I then have a conversation.

21:05 Dan

How’d you how’d you track that

log behavior? Is there any special tooling you use or something you built?

21:13 Ian Stokes-Rees

Yeah, I

⁓ that’s actually a great that’s a great question. So my second product that we’re gonna be building at PNI is a product to help surface the patterns of behavior that are occurring from the logging output, your your session logging output, the inputs and the outputs that you’re getting in these JSON files that are logged by Codecs and Gemini and and Claude.

The way I’m doing it right now is that I have a set of scripts that my again, my agents have created all of these things. And my agile process manager, I work with my agile process manager where a key always the agent. Yeah, all they’re all agents. There I mean there are peop there are people here, but on the engineering side, the only other the only other human engineer that’s not true. I have two human engineers working with me right now. I have one who is an engineer in Chicago who’s working with me to do

21:53 Dan

The agent, not person.

22:12 Ian Stokes-Rees

⁓ user interface validation, human in the loop, ⁓ testing of human workflows. My ⁓

S some of my attempts to d have that automated, they haven’t been suff they haven’t been sufficiently strong. And so ⁓ I’m still relying on human-based UI testing. ⁓ and so that’s Windows packaging, release management, and then interface testing. And then on the back end engineering process, I have an engineer that I’m working with in Spain, and he’s the one who set up the paperclip system and is working to separate ⁓ our product code base from then the process elements, which

Right now, this is one of the challenges a lot of the time when you the the the agentic engineering systems and processes are often embedded in the product code base. And there aren’t easy there aren’t right now easy ways to separate those. So I’m working with Santiago to figure out good ways to separate those two two pieces so that we have like a reusable portable process distinct from the product that we’re actually ⁓ building. Furthermore,

A lot of the tooling that’s in place, it always is thinking about ⁓ the human, the humans engaging in the engineering process, that there’s one human with one computer account on one machine. I regularly am having and now it’s actually I don’t tell it anymore, it’s actually part of our sprint flow. If you pay close attention and you work for a long time with any ⁓ any agents, they will learn things as you say, Hey, why did you do that? and could you

always do this and they’ll say, I’ll save that to memory. Well saving that to memory might be just to Dan’s account. But more often it will not just be just to Dan’s account. It’ll be to Dan’s account on a particular computer in a particular user account. For me, my agents don’t run in my user account. I’ve got a separate separate user accounts for separation. And so it’s they don’t have access to any of my my things. However, I have an engineer in Chicago, I have an engineer in Spain. When my agent that I’m interacting with learns something,

I need the agents that my Chicago engineer and my Spanish engineer also need to have that learning. And so that process to harvest those locally embedded learnings so that they’re then surfaced and then reusable, this is essential. But right now, like the products, they assume single human operator on a single computer and a single computer on a single account. And that’s that’s not how I’m operating. I don’t think that’s how anybody’s gonna be operating a year from now, but the fact that still today.

That’s how, as far as I can tell, that’s how OpenAI and that’s how anthropic think about the interaction patterns. And when you’re trying to do this work at scale, I’m trying to build a business which is gonna be big and scalable and I hand off all of the engineering work, then these systems and processes need to be ⁓ separated from an individual human being. I am just a in the moment the person managing and overseeing a particular agent.

And that’s a concept that doesn’t seem to have kind of ⁓ surfaced yet in the products as as they exist. ⁓ so that’s all back to like how do I how do I mine this? The next product that I’m building ⁓ is called Diane AI, and Diane AI is going to be all about surfacing and giving you as an individual visibility and a summary and access to the information that’s in your AI interaction exhaust.

Three months ago when I was looking to do the accounting work to track my agent ⁓

My agent token burn. I wanted to know number of messages. I wanted to know token utilization, and I wanted to know wall clock time. ⁓ My agents, both Codex and Claude Code, said, we can estimate that, but we don’t have that information available. And I said, I don’t believe you. I’m pretty sure that it’s there. And they’re like, wow, we don’t know where it is. And I did some research and I found out where it was and it’s sitting on like a particular folder in a particular place. And once I told them, then they’re like, yeah, look at that. There’s those there are those numbers.

And so, you know, now they know where to look, and now I have systems in place to mine that information, and I’m now able to mine that information to then capture patterns. And one of the one of the f the foundational ⁓ concepts in the charter for my agents is the concept of kaizen, continuous learning, continuous like reflection and questioning and learning. And so every sprint has a kaizen accumulator task, and as agents learn things.

They add comments to the accumulator task, which at the end of the sprint are then organized and consolidated and say, what are we going to do about these things that we learned? I don’t want the process to change in sprint unless we come across some like significant process failure. I want the agents to be like, okay, here’s our standard operating rules, at least for the sprint. And then at the end of sprint, we say, What do it’s sprint retrospective, right? It’s this is standard sprint retrospective.

And so our Kaizen accumulator task captures the learnings that come in from the sprint, my learnings, but also the agent learnings. When I have conversations with them about how did we miss that bug, what wasn’t specified well, why aren’t you changing the labels correctly? Why wasn’t this tested completely? ⁓ you know, all of these kinds of things. ⁓ we learn. We it’s like a joint learning experience. We accumulate our learnings each sprint, and then we translate those learnings into how are we going to adapt our process flow.

At the start of next sprint. We establish and fix those. It’s like task number one in the next sprint is to decide how we’re gonna update our update our process flow. And some of those things will be like, hey, we need like ⁓ we need some new skills to be defined, or we need to like we’ve got overlapping and there’s you know agents get confused whether they should do skill A or skill B to execute this kind of pattern. We should have like a better decision make making ⁓ rubric for the agents. And so this is the kind of process that that we follow.

28:15 Dan

And you may have said this before, but do you feel like a lot of this kind of agenc development process engineering is just applying the craft of or job of software engineering to agents in some degree?

28:37 Ian Stokes-Rees

that’s what I’m doing. To be clear, that a hundred percent that’s what I’m doing. ⁓ I was just having a conversation earlier today with someone that I do not view my agents as an agent swarm. An agent swarm is like a startup where you’ve got a charismatic founder who has a high level vision and idea and hires a bunch of smart people and says, Hey, you guys are all smart, you guys and gals, you’re all smart. Go do smart stuff. Figure it out between yourselves. Let’s just like let’s do something amazing, right? And

I suppose somehow, somewhere that probably works. I haven’t seen that work. I think that engineers and organizations need to have ⁓ people aligned on what their objectives are. They need to have processes and that processes help them have more freedom because they’re not they’re not like stepping on each other’s toes. The the the constraints around process give people freedom around design and implementation. And I know how to make that work. That’s the main thing. I I’ve seen it work.

I’ve made it work. I’ve done that at multiple organizations over many, many years. And you know, there’s a world in which people say agile software engineering is very laissez-faire, and I think it’s the exact opposite. Is that’s my view, that’s how I I do agile software engineering, that it’s hyper disciplined, it’s hyper systematic, and that an effort to ⁓ do task refinement and story point planning.

And then tracking individual and team velocity. It’s all information flow that lets you understand how is your team performing. ⁓ having discipline around ⁓ task inje in sprint task injection. my a w and I love the fact that my agents push back at me when I’m like, we gotta do this thing, and they’re like, Really? We’re close to the end of the sprint. Does it make sense to do that? We’re kind of like out of capacity. And ⁓ they’ll be like, ⁓ they’ll push back and then I’d

Half the time I’ll say, You’re right, we’ll defer to the next spread, right? My agents are embracing that concept of disciplined time based objectives, as opposed to, hey, we just like it’s an agent swarm, go do stuff. I want another feature. There’s a problem here, you know, and create like kind of wild things happening in different places with as many agents as you can spin up. My agents have specific roles, we have specific objectives, we’ve got ways that we work together, and for me it’s working.

30:58 Dan

So s stepping back a little higher level when you’re kind of before you’re going into a specific sprint planning, like setting higher level objectives, things like that, how does that part work? Is that still have aspects to the non-agentic way, or is it done with agentic assistance?

31:12 Ian Stokes-Rees

Yeah.

Okay, so what I would say is

I believe we’re in this effectively interregnum period between the last 25 years of how software engineering from the like the dot-com boom, which established predictable, reliable, high-performance ways of generating technology systems. Software systems, cloud-based, whatever, right? We figured we started to figure that out 25 years ago, and we’ve just gotten better and better at doing that. And I think Agile’s been like the

clear window in in that process. Now, over the past, really it’s just been the last year, we’re seeing this world in which we can do agent centric, ⁓ agent-centric engineering design development. We’re gonna end up in a different place. We’re gonna end up in a different place beyond just like it was people doing a certain process and now it’s the same process just with agents doing it. We’re gonna end up somewhere different. I don’t know where that is. But in this interregnum period, we’re in this space where

I think the best way to do this, if in doubt, is to follow a like the tried and true established practices, but engage and leverage AI assistants or AI team members as much as possible. ⁓ what that means, coming down like to concretely for me. What that means is I sit down and I work with my APM. My my Xavier is my lead APM, Yasmin is my

my secondary ⁓ kind of deputy APM. And I talk about where’s where do I want to end up at the end of the sprint? And Xavier will give some suggestions ⁓ as to ⁓ where we are, what’s achievable, what’s next in the backlog, like what’s the in terms of like our we’ve got a product roadmap and saying like here’s some of the capabilities it should c should we should be able to do. We’ve got c some constraints saying we shouldn’t pick more than I think it’s

had been 20 and then I bumped it up to 40 ⁓ backlog tasks because we always end up finding okay the backlog tasks have dependencies that are independent tasks and our our end of sprint numbers end up being about 100 to 120 tasks that get done because the the long tail at the end when we’re testing we find bugs they all come up as their own their own tasks that get solved in sprint. So we’re finishing about ish a hundred to one hundred and twenty tasks in a sprint, but when we

kick off the sprint, we’re aiming for 40 tasks. We’re trying to pull you know about 30 of those off the backlog, and then in the refinement process, we’ll end up creating some epics, consolidating some tasks, and maybe seeing that there’s some gaps and things. And this is a conversation I have with my with my APM. ⁓ I think the goal down the road will be there’ll be more human involvement in this actually. My my objective is to bring a d a human designer on and then the human designer plus a plus a human ⁓ go to market

⁓ sales and marketing lead for my organization, plus the lead engineer, those three individuals, plus myself, four of us. We together will have a conversation. We will leverage AI partners in the conversation, and together we will then figure out what are our high level Sprint objectives, we’ll then establish the tasks that we can bring in to do that work, and then we are going to learn what is our capacity to actually.

⁓ in a two-week window, if our goal is like a you know, wall clock calendar two week window of ish 14 days, but I mean we want to be pretty disciplined and have our releases kind of coming out on Mondays, ⁓ what can we what can we achieve in that time window? You know, what I’m learning is that I always have a long tail at the end of my ⁓ my end of my sprint when I come to doing like the release management and the end of sprint tasks. A lot of things are surfaced once it comes time to actually like, okay,

We pulled them all together and we’re now doing like the end-to-end tests. We’re now doing more detailed ⁓ human in the loop actually using the product and it surfaces issues, raises questions, we identify some bugs. Some of them aren’t bugs, but some of them are like, well, this feature doesn’t really I mean it’s not exactly a bug, but the feature really doesn’t work unless we, you know, modify it right now. And so then we pull in something else into in sprint. Because we have this long tail. What I need to learn, which I haven’t learned yet, is what will this look like when I’ve got more engineers and designers? ⁓

human engineers and designers in the system, how well we’ll be able to like better parallelize some of those tasks and actions. So that’s like kind of the discovery, the discovery phase. But right now it is ⁓ mostly me and Xavier, my agent APM, figuring out what goes into the sprint. We define that. It’s a high level like so all the all the agents understand let’s all be aligned on what our high level two-week sprint goals are.

not long list of tasks, but like here’s what we’re trying to achieve in the sprint, here’s the key functionality we’re trying to get out in the sprint. And that then lets them focus when they’re like pull a task in or defer it to the next one. And sometimes we have a feature where we’re like, we’re just not gonna land this feature. There’s too many dependencies we’ve identified. Let’s push this feature off to a next sprint.

36:39 Dan

From a task management queue standpoint, how do you manage the tasks and distribution to the different agents? Like are you using GitHub or do you have your own system there?

36:50 Ian Stokes-Rees

Yeah, great question. Yeah, GitHub. All GitHub. So having having

used ⁓ having used ⁓ note cards on a whiteboard, original XP style when we all worked in offices together, ⁓ to

Trello, Jira, GitHub Projects. For me, GitHub Projects is ⁓ the perfect level of detail. I don’t need the sophistication and kind of design your own custom ⁓ project management system that Jira provides. I know it’s essential for larger organizations and bigger projects, but for what I’m doing at the level that I’m operating at, with the people that I’m working with, the human beings that I’m working with right now.

This has been the case for a long time. This is the case when I was working at Anaconda. This was the case well through my time at BCG. I use Jira on many projects at at BCG, and we shift to using JIRA at Anaconda. And I understood the benefits for the people who who wanted and needed those. But I use GitHub Projects, ⁓ which gives me a Kanban layout. I use milestones for the sprint. So the sprint is a has a date. It’s got a bunch of tasks tied to it. I’ve got GitHub projects, which gives me a Kanban. I use ⁓

GitHub attributes to be able to put story points in, and that thing gives us ⁓ not burn down, but burn up is kind of the way it ends up being graphed out in GitHub. ⁓ And then heavy use of labels. And so we use labels for tracking like more detailed task state. And ⁓ I’ve got about 300, 300, 400 tasks in the backlog. We pull 30 or so of those into on deck at the start of a sprint.

And so start of sprint, there’s like there’s nothing in on deck. ⁓ there should be nothing in on deck in progress ready stalled done. So those are cleared out at the end of sprint. ⁓ start of sprint, they’re still blank. When we kick off the sprint, a bunch of sprint specific tasks are created for the start of sprint and end of sprint tasks, and those get queued up into on deck. ⁓ the milestone milestones don’t have labels on them, so we kind of hack by

Putting labels, ⁓ text-based labels in the milestone descriptions, which are then you know mined by things that gives us like the sprint level, high-level ⁓ status information is kind of embedded in there. ⁓ and then the sprint planning process either pulls in or creates tasks into the on-deck for this sprint. And between Anne, who is my ⁓ principal architect.

And does a lot of and mostly does work at the very beginning of sprint and the very end of sprint and very little work in the middle of the sprint. That’s when it’s just like engineers executing. So at the beginning of Sprint, between my APMs and my my principal architect Anne, we work together to figure out the kind of the book of work for the sprint. And then once it’s all queued up and the tasks, there’s a task refinement step, and that we hand off to the software engineers, and the software engineers figure out the plan for all of the tasks.

And then does a review of all of those to make sure there isn’t like overlapping functionality that have been individually established by the five different software engineers. You could see as they like get a little bit overly ambitious with the implementation since everybody’s working approximately in the same space for the sprint. We try and de-conflict in advance of execution just from the plan basis. And then Anne figures out sequencing dependencies of the tasks. And then we say, all right, sprints on.

It’s like pick up work. And I operate on the basis of ⁓ empowering. It’s again an agile principle. In an agile principle, your scrum master, your product master, your team lead does not say, Dan, do this task. Ian, do this task. No. We need to be in a situation where Dan has a process by which Dan knows here’s where I look for work. Here’s my protocol for what I need to do. The first thing I should do is make sure stuff I already started can be finished.

40:38 Dan

Okay.

41:04 Ian Stokes-Rees

If it’s not finished, then I need to like put my hand up during the during daily stand-up and say, hey, like I’m blocked on this because somebody needs to like answer the question I asked on this task, right? So the first thing is finish in in in process work. ⁓ The second, you know, after that, it’s like look to see did you have anything that was waiting a QA review? If the QA reviews are done, go back and address the QA reviews or enter into a conversation. And if all of those are done, then then go and look and pull in new work.

And so we got a whole protocol, and all I do is I start up the agents. ⁓ I try and restart the agents every day so that clears context. And also where there is some in-sprint new learnings that occur, as well as some updates to like the dev environment. ⁓ I restart I try to restart the agents every day, refresh the environment, get them working off like they know to work off of main when they start a new branch, but I’ll start the Claude doesn’t know.

Claude picks up information from the agents.md file, and I want it to have, if there were any changes, I want it to have the most recent one there when that session actually starts, because that gets loaded into its initial memory. ⁓

42:15 Dan

And you’ll will you set

a separate agents MD file per agent class, essentially?

42:20 Ian Stokes-Rees

Dan, this is a whole other story. I’ll come back to that. Ask me that about that in 30 seconds. So all I

say, I give a very simple initial prompt, and I know I could even like automate this as the startup, but I want to like sometimes I don’t want them to like right away get get running. I start up the agents and I say, I say to the agent, ⁓ please follow Sprintflow in the role software engineer, or please follow Sprintflow in the role QA engineer, please follow Sprintflow in the role.

Test engineer. I got like six different six different roles. Agile process manager, ⁓ QA engineer, ⁓ test engineer, software engineer, principal architect, and I think there’s one release manager. There that’s it, those are the six roles. And ⁓ you know, for some of those I start up a couple. For QA engineers and at APM, I start up a couple. ⁓ software engineer, I start up, I think, five, maybe five, maybe six software engineers. ⁓

And I want to tell them what their role is. They know how to find out their own name. And they also know if I just said please follow Sprintflow, they know how to figure out what their default role is because there’s like a role name mapping. and then all the tasks are tagged. my ideal world, every one of my agents would have their own GitHub account, but that would just be a little bit expensive. And then I have to go through a separate process for ⁓ kind of token process management. And instead they all run under a bot account. So they have to label their work when it goes into GitHub.

either in the description or comment or by the way they flip labels ⁓ as to which agent is owning which task. ⁓ and the agents just pull work and they just they just go. And then sometimes they stop and it’s always a puzzle to me why they stop. They stop and I’m like, why did you stop? And they’re like, Yeah, I shouldn’t have stopped and then they keep going. But

44:03 Dan

Do you ha do you have

like a a daemon process running that’s monitoring if they stop or is this something no.

44:09 Ian Stokes-Rees

That prods them? No. No.

Paper clip paperclip will give us that. And I know that that was one of the roles in in Steve Yeaggy. Like Steve Yeagie found this. Steve Yeaggy had two layers in Gastown. He had two layers of I think it was called the watchdog or something or the watchman or something like that. And it basically went around and it just like kicked the agents. It said, Keep working, restart, keep going. ⁓ and yeah, I’ve like I still have to do that. Most of the time they keep working, but then occasionally they stop. And then they ask then they’re like, should I proceed?

44:28 Dan

Yeah.

44:38 Ian Stokes-Rees

And I’m like, always proceed. Always keep going. There’s no question except should I keep going? And the answer to the should I keep going is always yes. So anyway, yeah. See

44:46 Dan

It’s like it I haven’t

haven’t taken that from the agent D yet.

44:51 Ian Stokes-Rees

Yeah.

Yeah, so let me let me talk briefly about Agent MD. I could talk a lot about the Agent MD. So one of the challenges I have right now, and I actually spent a whole week ⁓ working with my APM on my Agent MD strategy. ⁓ my age I wrote the first Agent MD. It was about a thousand lines long. It was I tried to consolidate into a single file the structure that I wanted for an agile process for my agents to follow.

45:21 Dan

And this is for all of the agents or for a single?

45:21 Ian Stokes-Rees

And this was a single

one for all of the agents. Yeah. And this was three months ago. ⁓ yeah, probably three months ago. And then I then I did a second iteration, which I the AP I like I guided in my APM ⁓ updated, it got up to about 1500 lines. Once it got up to 1500 lines and I started seeing some of the issues with my agent behavior where they were crossing role boundaries and doing surprising actions, I realized, okay.

I need to break this into parts and then I need to come up with the navigation structure so that a given agent only follows the part of a like enters at agent’s MD, but then navigates to subdocuments only based on need. And that effort took well, I’ve gone through many iterations of it, but the first big effort to to do that was about a week of work.

And it expanded my then maybe 12 to 1500 line agent MD file into about 20 files, 20 sub files that were then organized with tables of contents at the top of every file and then pointers and guidance to kind of the tr it wasn’t a network, it was a tree, intentionally like a tree, so there’s no loops that agents could then follow to find the information that they needed to know how to behave and operate under different circumstances.

And ⁓ that’s now grown to about 5,000 lines of instruction, probably across about 45 files. So there’s a lot of assets there. and I’m using ⁓ I looked at QMD and I think it’s called Make Docs. And I’m using Make Docs as a kind of ⁓ static, it’s a static site generation system that works well for agents because agents can like navigate it as files on folders and files on disk in markdown format.

47:10 Dan

Interesting.

47:11 Ian Stokes-Rees

Yeah, and so that’s the mechanism by which the agents find information. ⁓ it needs another pass to be refined further, but ⁓ it’s you know, it’s working pretty well for me right now, but it’s it’s pretty sophisticated, pretty complex. It’s about ninety percent project independent. I still have probably ten percent of the content is specific to the particular AI studio product. I need to separate that more so then we’ve got a separated ⁓ process element from the product. So the

The two are sep separated.

47:42 Dan

This is s is this something you treat as a skill or you say here is a directory of documentation?

47:48 Ian Stokes-Rees

Well, it flow yeah, it’s a great great question.

It’s ended it’s ended up in a world where there’s many, many layers to the way the agents interoperate. This information in the structure that flows out of agents.md my agents MD file now I think is eighty lines long. It’s it’s just a like a detailed annotated table of contents. It has like a high level charter of principles and then it’s got like here’s where you go next depending on what you’re looking for, and then you’re just like following

following documents. The agents.md file is kind of weird. Exactly. That’s exactly correct. ⁓ That is really all tied to process knowledge which is held in context or can be referenced by agents in a particular role. It points them to role descriptions. Although the agents are meant to like start with knowing their name and their role and they automatically pick up agents MD. If they know what their role if they know the name of their role, they then can dereference that

48:19 Dan

Progressive disclosure.

48:45 Ian Stokes-Rees

the structure, which is say what what’s the job description and then what are the processes that I follow and this role, what are the key skills that I should use. And then there’s a catalog of skills that are available within the system. But then skills are self-documenting. So you kind of like I think it’s a little bit belt and braces that they’re they’re contained in two places for for visibility and awareness. So there is an element of skills. There’s then as I mentioned the skills then tie to ⁓ what are probably about 20 or 30 scripts right now.

I can’t recall the exact the exact number of those scripts. Those have to all be refactored into like Python modules because there’s a lot of duplication in what’s inside those ⁓ combination of PowerShell scripts for the Windows build parts and bash scripts for everything else. ⁓ and then we make use of pre-commit hooks and a bunch of CI capabilities. So we do leverage where it’s on the engineering side, ⁓

I think there’s probably a dozen different CI hooks that are running, you know, server-side and GitHub that are triggered at different points depending on what’s being committed and what actions are being taken. and then pre-commit hooks are important too. So those are that offloads from the agent the responsibility to like remember to do a bunch of things. They just happen automatically on a like on a get add or get commit, that kind of thing. So we we leverage pre-commit hooks.

49:49 Dan

Okay.

running formatting

and sort imports, all that kind of stuff, yeah.

50:05 Ian Stokes-Rees

Checks, yeah, checks and validations.

Yeah, exactly. Doing checks and validations on the code and and that reports. That kinda gives it like the the checklist, which thing is like the pre flight checks, pre commit kind of thing. So

50:18 Dan

And w in CI are you using Claude as well or purely through the API?

50:23 Ian Stokes-Rees

Yeah, I’m using both Claude. I’m

using both Claude and Codex. And I initially they’d been running on some like haiku and GPT mini models. I like I need to go back and revisit what’s being what’s being done there. The goal of those CI ⁓ CI reviews, they’re limited in scope to limit token burn. ⁓ but also in terms of just like what’s what are the objectives. And they have different objectives depending on what’s been committed.

If you got markdown files being committed versus code files, kind of thing, they’re different different ways they need to be looked at and and considered and evaluated. ⁓ they provide like a first pass report around like if there’s any code smell, if there’s any testing gap. Sometimes you get like, it’s rare now, but you know, it was much more frequent in 2025, where you just get like syntax errors, stuff that was like written, and then somehow code can be committed with where it’s like syntactically incorrect. If it’s like an unreached block and you don’t have something that’s been like syntax checked.

But I think between my pre-commit hooks and things like that, that’s pretty unlikely that some of those are gonna slip through. ⁓ but they also do a level of code review in terms of like intentionality, like what is the objective of this code? ⁓ and does the implementation meet the objective? Is the test coverage where it’s supposed to be? So there’s bunch of ⁓ tests and checks. Is the documentation in place, code commenting, ⁓ also checks around ⁓ process checks, like is has the ⁓

PRs can’t be merged if the PR description has not does not follow the template of the information that needs to be recorded. ⁓ as you know, one example. PRs can’t be committed if there’s if labels are in certain states, certain labels are still present, that kind of thing. So there’s a lot of pieces that have been ⁓ systematized. I’d like to think that most of it is general it’s already been generalized and would be reusable by anyone anywhere.

52:00 Dan

Okay.

52:14 Ian Stokes-Rees

But like how well has that been documented and has that ever been tested? It’s not been documented very much and it’s never been tested anywhere else. But that’s certainly something that we’re we we think is the objective and we aim to dog food that with this next product, which is the ⁓ AI session harvesting for visibility. ⁓ that’s like the next product we’re gonna start working on. I hope later in May, it might not be till June.

52:39 Dan

So it’s exciting stuff.

There’s w way more I could ask you about this. I’ll have to have you have it back another time. a few th a few other things changing changing the subject slightly, not too much. ⁓

What do you still not trust from these tools?

53:06 Ian Stokes-Rees

⁓ that’s such an interesting question. So ⁓

I never trust that they’ve done the thing that they’ve said they’ve done. So that’s the first thing that I would say. ⁓ yeah, I that’s my starting point. My starting point is like, do not trust that what they’ve implemented does what it was supposed to do. The way I describe it right now is that I find that my AI agents are ⁓ very solid, second quartile engineers, designers.

53:20 Dan

It’s a pretty important one.

53:42 Ian Stokes-Rees

managers. They’re not top decile. I don’t even think they’re top quartile. But in my head, if I think about them as solid second quartile, above the mean, but like kind of sort of only just above the mean, engineers, designers, managers that are extremely fast, very pleasant to work with, very professional, and very inexpensive. ⁓ That puts me in the right mindset for how I work with them.

Now, when you say the things I don’t trust, I will look at all of the work that my agents create. I definitely have a different bar for like what’s the standard of review that I put in because the cost of making the wrong decision and saying, I think it looks good enough. Well, if it isn’t, I’ve got enough testing and processes in place, especially with the kind of thing I’m trying to build, that I’ll miss things for sure. We’ll miss things even if I did a detailed review from time to time. We can come back and correct it easily. But with the way I’d put it is that

Out of five times that I go and review the work from an agent, three of those times I’ll look at it and I’ll be like, looks fine. No questions, looks fine. Commit. One of the out of the five times, I’ll have questions which will lead to a discussion. I’ll ask questions. The agent will justify what it’s done. And at the end of the day, I’ll say, okay, I’m good with that. And so proceed. No changes needed. But I would still say, somewhere in the 10 to 20% of the times, let’s say one out of five times.

I will find that there have been like important things that have been missed. And I will ask and challenge the decisions that were made and what was implemented. And the agent will say, Yeah, that’s a good call. This is like we did not do this right. And then we’ll rework, then there’ll be rework. We’ll re rework the PR. Sometimes we just cancel the PR and just start over again. and so that’s so that’s the foundation. My starting point is like I kind of don’t trust any of it as a starting point, and then I review.

And I find that most of the time I’m able to say, okay, looks all right. But two out of five times I have significant questions. And one out of one of those times will end up ⁓ leading to my intuition is that there’s problems there. And then I mean the big challenge I think for a lot of people in this space is if you’re really in like the pure vibe coding space where you have no clue, like the agents are definitely better than you are at doing whatever, whatever you’re you’re implementing, and you have no basis to like

question or understand what they’re putting back in front of you, you’re accumulating tons of technical debt that you don’t know about and you’re building a big house of cards. Fine for prototypes. I just don’t know how like a vibe-coded future looks right now in 2026. Again, back to my comment about an interregnum period. ⁓ this I am confident that this will change. I don’t know if this will change later this year or not for several years, but in the right here and now, ⁓

Vibcoded capabilities, either for very basic things, for websites, for reports, no problem. For prototypes, no problem. For a serious for serious systems, you’ve got to have like you’ve got to work with your agents, design thinking, separation of concerns, testing, documentation, process and procedures to stabilize what you’re implementing. And then the agents can’t even right now, the agents don’t even think about creating products, which are

themselves AI enabled. A year ago, I was building products for people to use the products I’m building. Now I’m building products and I’ve got two target audiences, people and other agents. And I’ve got to build the systems so that the components and capabilities are exposed and are accessible in the right ways for my own agents or other agents to be able to interact and leverage. And we’re all just figuring out how that how that works and what that looks like right now.

But if you don’t have if you don’t have that kind of design thinking mindset and experience to consider what would it mean for an agent to interact with a product, I don’t know how you prompt your your your vibe coded agent to create that thing for you. It’s like the meta levels of thinking.

57:51 Dan

Yeah, I think back to I think this was out of GitHub a long time ago where they would require every project to have seven scripts. Like to be able to bring the project up, often one that included a console to be able to interact with the system and

I think to some of that in an agentic workflow, these are the things you don’t want your agents reinventing this every time. You want them to have the operating procedure.

58:20 Ian Stokes-Rees

That’s correct.

Yeah, that’s exactly that’s exactly correct. And I think that’s ⁓ you know, we’ve spent a lot of the time here talking about exactly that. Like you know, we’re talking much not a lot about the product I’m building, but more about how I’m building that product. And ⁓ sophisticated products

are gonna require a sophisticated level of engineering, design, repeatability that it can all be done by agents, but there needs to be a level of today in twenty twenty six there needs to be a level of human engineering oversight and also product design, management, and so on that that today’s agents are not yet bringing to the table.

59:07 Dan

That’s another slightly different area, but I mean I mainly manage human engineers the past ten years and managing agents is I feel a different cognitive mode than managing people and certainly than writing code personally. How does how does that feel

59:33 Ian Stokes-Rees

Yes.

59:37 Dan

over a full day, over week. ⁓

59:41 Ian Stokes-Rees

Yeah. Ha very very good question. So one of my BCG colleagues wrote an article in Harvard Business Review that came out I think in early March, called AI Brain Fry. It had a longer, more interesting title, but it had the words AI Brain Fry in it. And so Matt Cropp, who was one of the people who organized that, he’s a he’s an AI leader at B C G that is a great article. I read that, I’m like, Yeah, that’s exactly what’s happening to me. The AI agents work so fast and they they

They need your attention around the same amount of like frequency based on their work products, just their work products come out about 10 times as fast. So they need your attention about 10 times as frequently. ⁓ and so that’s really challenging. And they don’t sleep, and so they’re always there. So a combination of yesterday I heard a new term, dark flow. So I’m a big believer in the value of getting into a flow state that engineers need to be able to ⁓ operate in an uninterrupted environment.

1:00:19 Dan

Yeah, I would just we don’t see the grave here.

Which you could look at here.

1:00:41 Ian Stokes-Rees

And I’ve found that when working with my AI agents, that the level of like the number of things I need to keep in my head, not at the level of like coding a capability or looking at a you know data interfaces, APIs, whatever, but at the level of like tracking and managing and interacting with all my AI agents are doing, I cannot have any interruptions. I can’t have pings or notices, I can’t have like my phone or an email coming in interrupt me. When I’m in that like agent ⁓ coordination flow state, I can’t be interrupted.

The thing is, is my agents are always there, always waiting. And knowing that I have this like engineering capacity that could be doing something productive for me if I just like get them unstuck, give them a little more feedback, tell them which tasks to pick up next, if there’s like like some uncertainty, that’s it’s very stressful. And so, ⁓ so it’s been challenging. And I’ve just had to been more try and get more disciplined with my with my time, with my daily cycles of

1:01:28 Dan

Mm.

1:01:37 Ian Stokes-Rees

When do I start work? When do I check email? When am I work with my agents? ⁓ but this comes back to an earlier comment I made, which is that we all need to get to a point where agents aren’t dependent on us. I need to be able to go for a run, go to sleep, have a meal with my family, and have my age and have my agents continue to work. The point is and have my agents continue to be productive. And so the model that I’m the model that I’m aiming for.

1:01:54 Dan

Sleep, what?

1:02:06 Ian Stokes-Rees

is to be able to have at least on a four if not five day basis, ⁓ have teams that work four 10-hour shifts, follow the sun, teams that are around the world, where they have a one-hour overlap, and these are human teams, so human beings. The agents are running on a centralized server somewhere. They’re managed by Paperclip. And the idea is that there’s a software factory. Some of these

Jensen Huang will love. He talked about AI factories a bunch in his keynote talk at GTC a month ago, ⁓ six weeks ago. ⁓ We’ll have a software factory. The software factory will have assembly lines. The assembly lines will be coordinated groups of agents that are working in the same space on like kind of connected pieces of work. And we will then have foremen who come in and manage and oversee those agents. But the foreman

1:03:04 Dan

Yeah, I know. Yeah.

1:03:04 Ian Stokes-Rees

Handoff the foreman like leave the agents,

the robots on the assembly lines, they keep working. It’s not like Ian’s got his agents and Dan’s got his agents. No. My business has agents and they work in the factory and they work on assembly lines together to build things that are moving down the the assembly line. And the foreman come and go and the assembly line keeps running. And the foreman have an hour of handover around, hey, where are we right now? What are some of the issues that came up? And we then have that we then have a handoff.

1:03:25 Dan

Wait.

1:03:33 Ian Stokes-Rees

⁓ between people. And that’s my that’s my goal. And so I’m right now imagining four day working week with ten hour shifts for this overlap. and you know, if we if we’re so lucky to expand that, okay, we expand that to, you know, twenty four seven. ⁓ but

1:03:49 Dan

Yeah, yeah, yeah.

1:03:54 Ian Stokes-Rees

No humans working twenty four seven. Because right now, like the systems right now aren’t s not sustainable for me. I s I start at like five AM most days to get my agents working. I try and work for an hour and a half to two hours. Then I then I take a break from my agents for a few hours, but I’m really aiming to have them continuing with work so that I then come back and re engage with them somewhere between 8 30 and 9, and then I pick up wherever they were. ⁓ and but that’s in the this is like in a you know an environment where I’m, you know.

1:03:59 Dan

Yeah.

1:04:25 Ian Stokes-Rees

Human limited at the moment.

1:04:26 Dan

Yeah.

Those tokens aren’t gonna use themselves.

1:04:32 Ian Stokes-Rees

Well, I try. Like I said, I mean that’s that’s what kind of triggered

this whole conversation. My agents ⁓ my agents have done impressive things. I you know, I had that a month ago I had a hundred PR day. One day, one hundred PRs merge. Like that was crazy. That was a sixteen hour work day for maybe it was more than sixteen hours. So it was a long work day. But it was a hundred PRs merge and I looked at every one of those and I pressed the merge button only when it was green after all the tests ran and you know, every the branch were up to date, the tests were the tests were in. That was a big day. ⁓

you know, and I and finding ways in which the agents can coordinate and communicate with each other where they’re not looking to me, they can communicate with the Kanban intermediary as the state holding system and they know how to hand off in terms of ⁓ task flow and task state. That’s ⁓ that’s been a it’s not perfect, but it’s largely been a successful system for agents working you know working their way through.

the the work and that’s get that that’s gotten me to, you know, billion token days as well. Which again I think Jensen Wong and others at Anthropic and OpenAI will be happy to hear about. So

1:05:43 Dan

For sure. Okay. Well we’re about at time here, but wanted to give you a chance to give a shout on anything related to PNI if you’re looking to hire folks or anything.

1:05:48 Ian Stokes-Rees

Yeah.

Yeah, well

thanks for asking. I mean the main thing for me is if people are interested in talking about the concept of on the product side, it’s data AI ⁓ and analytics in an enterprise context. So that’s my like that’s my b primary business focus. However, there’s been a lot of things I’ve I’ve shared here in the space of AI driven engineering, leveraging ⁓ AI exhaust and understanding AI processes. ⁓ this is a like a secondary focus.

that we’re building that will be ⁓ a capability in this Diane.ai ⁓ product to that’s still concept phase right now. ⁓ people are excited about that idea and want to talk to me about it some more. I’m always happy to hear from hear from people. And it’s just ⁓ Ian at python next dot com. So I’m happy to to follow up or you can just you know cyber stock me and you’ll find me pretty easily. So thanks Dan.

1:06:51 Dan

Yeah, and I’ll we’ll have all of your contact info on the episode page as well, so people can find you there. Great Well, Ian, thanks again for joining us. It’s been great having you. Happy to have you again. See see where this continues to take you. And we’ll sign off here. Thanks again.

1:06:58 Ian Stokes-Rees

Perfect.

Yeah, we’ll see where it goes. It’s exciting time.

Thanks, Dan.

Bye.